The Density 2D plot

Density can be represented in the form of 2D density graphs or density plots. A 2d density chart displays the relationship between 2 numeric variables, where one variable is represented on the X-axis, the other on the Y axis, like for a scatterplot. The number of observations within a particular area of the 2D space is counted and represented by a color gradient to indicate differences in the distribution of data in one region with respect to the other.

Quick details:

What

Discover Distribution

Why:

Understand correlations in big data with density distributions

History of Density 2D

A density plot is a smoothed, continuous version of a histogram estimated from the data. The most common form of estimation is based out of kernel density estimation (KDE). In this method, a continuous curve (the kernel) is drawn at every individual data point and all of these curves are then added together to make a single smooth density estimation. The kernel most often used is a Gaussian which produces a Gaussian bell curve at each data point.

2D Kernel density plots producing a smooth estimate of the density
Source

When to Use a Density Plot?

1

When the sample size is huge and to get a clearer picture of the distribution

Use 2d density distribution when there are a large number of data points and risk overplotting in a scatterplot. As there are too many dots, the 2D density plot counts the number of observations within a particular area of the 2D space. A 2D density plot can represent an otherwise hidden pattern of the density distribution indicating density differences using color.

An overplotted scatterplot in comparison with a density 2D graph showing differential density distribution with colors

Source

 

2

When you need a nuanced visualization of density

2D histograms and hexbins are useful when you need to analyze the relationship between 2 numerical variables that have a huge number of values using multiple squares or hexagons as a shaped polygon. It avoids the overplotting matter that you would observe in a classic scatterplot. One can explicitly indicate how many bins you want for the X and the Y-axis, showing a slightly different visualization using different polygons. You can also estimate a 2D kernel density estimation and represent it with contours.

Different possibilities of representing the 2D Density space
3

To visualize several distributions at once, kernel density plots will generally work better than histograms.

In a density plot, we attempt to visualize the underlying probability distribution of the data by drawing an appropriate continuous curve. This curve needs to be estimated from the data, and the most commonly used method for this estimation procedure is called kernel density estimation. Overlapping density plots don’t typically have the problem that overlapping histograms have, because the continuous density lines help the eye keep the distributions separate. For multiple distributions, histograms tend to become highly confusing, whereas density plots work well as long as the distributions are somewhat distinct and contiguous.

Density estimates of the butterfat percentage in the milk of four cattle breeds. Data Source: Canadian Record of Performance for Purebred Dairy Cattle. In kernel density estimation, we draw a continuous curve (the kernel) with a small width (controlled by a parameter called bandwidth) at the location of each data point, and then we add up all these curves to obtain the final density estimate.

Source

 

Types of 2D Density Plots

1. Hexbin

Very similar to the 2d histogram, but the plot area is split in a multitude of hexagons instead of squares. 

2. 2D Histogram

This is the two-dimension version of the classic histogram. The plot area is split in a multitude of small squares, the number of points in each square is represented by its color.

3. Contour Plot

a graphical technique for representing a 3-dimensional surface by plotting constant z slices, called contours, on a 2-dimensional format. In a contour plot, one can show the contour of the distribution, or the area, or use the raster function, to represent density.

When Not to 2D Density Plots?

1

When you do not have enough data points to risk overplotting

Use scatterplot if there is no overplotting. 2D density plots are only effective in cases of overlapping data points which can be substituted by color gradient to represent values and give a sense of the data distribution more clearly. In other cases, scatterplot is a more effective visualization.

2

When you cannot control the plot’s bandwidth

The bin size/bandwidth of the density plots need to be worked with as for 2d, density and histograms these plots are very sensitive to this parameter and can lead to different conclusions. If one cannot change this parameter as per context other plots should be used to represent density distributions more accurately.

Share on

Was this Page helpful?

Thank you for your feedback.