Density can be represented in the form of 2D density graphs or density plots. A 2d density chart displays the relationship between 2 numeric variables, where one variable is represented on the X-axis, the other on the Y axis, like for a scatterplot. The number of observations within a particular area of the 2D space is counted and represented by a color gradient to indicate differences in the distribution of data in one region with respect to the other.
What: Discover Distribution
Why: Understand correlations in big data with density distributions
History of Density 2D
A density plot is a smoothed, continuous version of a histogram estimated from the data. The most common form of estimation is based out of kernel density estimation (KDE). In this method, a continuous curve (the kernel) is drawn at every individual data point and all of these curves are then added together to make a single smooth density estimation. The kernel most often used is a Gaussian which produces a Gaussian bell curve at each data point.
When to Use a Density Plot?
1When the sample size is huge and to get a clearer picture of the distribution
Use 2d density distribution when there are a large number of data points and risk overplotting in a scatterplot. As there are too many dots, the 2D density plot counts the number of observations within a particular area of the 2D space. A 2D density plot can represent an otherwise hidden pattern of the density distribution indicating density differences using color.
2When you need a nuanced visualization of density
2D histograms and hexbins are useful when you need to analyze the relationship between 2 numerical variables that have a huge number of values using multiple squares or hexagons as a shaped polygon. It avoids the overplotting matter that you would observe in a classic scatterplot. One can explicitly indicate how many bins you want for the X and the Y-axis, showing a slightly different visualization using different polygons. You can also estimate a 2D kernel density estimation and represent it with contours.
3To visualize several distributions at once, kernel density plots will generally work better than histograms.
In a density plot, we attempt to visualize the underlying probability distribution of the data by drawing an appropriate continuous curve. This curve needs to be estimated from the data, and the most commonly used method for this estimation procedure is called kernel density estimation. Overlapping density plots don’t typically have the problem that overlapping histograms have, because the continuous density lines help the eye keep the distributions separate. For multiple distributions, histograms tend to become highly confusing, whereas density plots work well as long as the distributions are somewhat distinct and contiguous.
Types of 2D Density Plots
Very similar to the 2d histogram, but the plot area is split in a multitude of hexagons instead of squares.
2. 2D Histogram
This is the two-dimension version of the classic histogram. The plot area is split in a multitude of small squares, the number of points in each square is represented by its color.
3. Contour Plot
a graphical technique for representing a 3-dimensional surface by plotting constant z slices, called contours, on a 2-dimensional format. In a contour plot, one can show the contour of the distribution, or the area, or use the raster function, to represent density.
When Not to 2D Density Plots?
1When you do not have enough data points to risk overplotting
Use scatterplot if there is no overplotting. 2D density plots are only effective in cases of overlapping data points which can be substituted by color gradient to represent values and give a sense of the data distribution more clearly. In other cases, scatterplot is a more effective visualization.
2When you cannot control the plot’s bandwidth
The bin size/bandwidth of the density plots need to be worked with as for 2d, density and histograms these plots are very sensitive to this parameter and can lead to different conclusions. If one cannot change this parameter as per context other plots should be used to represent density distributions more accurately.