2D Density Plot

A 2D density plot, visualizes the relationship between two numeric variables, with one variable on the X-axis and the other on the Y-axis, like a scatterplot. This density plot in data visualization counts the number of observations within specific areas of the 2D space and uses a colour gradient to highlight variations in data distribution across regions.

Quick details:

What

Discover Distribution

Why:

Understand correlations in big data with density distributions

History of Density 2D Plot

A density plot is a smoothed, continuous version of a histogram, estimated from data using kernel density estimation (KDE). In a 2d kernel density plot, a continuous curve called a kernel is drawn at each individual data point. These curves are then summed to produce a single smooth density estimation. The kernel most used is the Gaussian, which creates a bell-shaped curve at each point. This method forms the basis of the 2d density plot in data visualization, helping to reveal the underlying distribution more clearly than traditional histograms.

2D Kernel density plots producing a smooth estimate of the density
Source

When to Use a 2D Density Plot?

1

With large sample sizes to reveal clear distribution patterns

Use a 2D density plot when data points are numerous and scatterplots risk overplotting. The 2D density plot counts observations within areas of the 2D space and uses colour gradients to highlight hidden density variations.

An overplotted scatterplot in comparison with a density 2D graph showing differential density distribution with colors

Source

 

2

For nuanced density visualization

2D histograms and hexbins analyse relationships between two numeric variables by dividing the plot into polygons (squares or hexagons), reducing overplotting. A 2d kernel density estimation plot can also be represented with contours for smooth density interpretation.

Different possibilities of representing the 2D Density space
3

To compare multiple distributions

Kernel density plots generally outperform histograms in visualizing several distributions simultaneously. Overlapping density curves allow easier separation of groups, whereas histograms often become cluttered. The continuous curves of kernel density estimation plots make density plot interpretation clearer and more effective in 2d data visualization.

Density estimates of the butterfat percentage in the milk of four cattle breeds. Data Source: Canadian Record of Performance for Purebred Dairy Cattle. In kernel density estimation, we draw a continuous curve (the kernel) with a small width (controlled by a parameter called bandwidth) at the location of each data point, and then we add up all these curves to obtain the final density estimate.
Source

Types of 2D Density Plots

1. Hexbin

Like a 2D histogram, but the plot area is divided into many hexagons instead of squares. This structure can better fit data and reduce visual bias.

2. 2D Histogram

A two-dimensional version of the classic histogram, the plot area is split into small squares, with each square’s colour representing the number of points it contains.

3. Contour Plot

A graphical method that represents a 3-dimensional surface on a 2D plane by plotting constant z-level slices called contours. In density visualization, contours represent areas of equal density, allowing clear density plot interpretation. Raster functions can also be applied to represent density as colours or shading.

When Not to Use 2D Density Plots?

1

When there aren’t enough data points to risk overplotting

If the dataset is small and scatterplots do not suffer from overplotting, use scatterplots instead. 2d density plots are most effective when data points overlap heavily, allowing colour gradients to represent density and reveal distribution patterns more clearly.

2

When you cannot adjust the plot’s bandwidth or bin size

The bandwidth or bin size for 2d density plots and histograms is critical and sensitive. If this parameter cannot be controlled or optimized for the data context, density plots may lead to misleading conclusions. In such cases, alternative plots should be used to represent density distributions more accurately.

Share on

Was this Page helpful?

Get in Touch

Partner with us to bring your ideas to life.

Thank you for your feedback.