3.5.2 - Bubble Plots

A bubble plot can be used to display data concerning three quantitative variables at a time and a categorical grouping variable.  In the example below, three variables are displayed: one on the \(x\)-axis, one on the \(y\)-axis, and one as the size of the bubbles. In Figures 2.75 and 2.76 in your textbook, four variables are displayed: quantitative variables are represented on the x-axis, y-axis, and as the size of the bubbles; a categorical variable is represented by the color of the bubbles.

Example: Height, Weight, & Days Exercising Section

The plot below was made using the statistical software R. Data were collected from World Campus students. They were asked for their heights, weights, and how many days per week they exercised. Researchers believed that there would be a linear relationship between height and weight overall but that number of days exercised would also be a factor. In this plot height (in inches) is on the \(x\)-axis, weight (in pounds) is on the \(y\)-axis, and the size of each bubble is determined by the number of days per week that the individual exercised.

50 55 60 65 70 75 80 100 150 200 250 300 350 Height (in inches) Weight (in pounds)

 

Larger bubbles signify more days per week exercising. From this plot, we can see that there is a positive linear relationship between height and weight. We can also see that many of the larger bubbles (i.e., people who exercise more) tend to fall below the line of best fit and more of the smaller bubbles are above the line. In other words, people who spend more time at the gym have larger negative residuals. This means that for their height, they weigh less than predicted given a model that uses only height to predict weight. 

 

Example: Air Quality in New York Section

1 0 50 100 150 6 11 16 21 26 31 Day of the month Ozone (ppb) Air Quality in New York by Day Wind Speed (mph) Months July August 4 8 12 16 September

Source: http://t-redactyl.io/blog/2016/02/creating-plots-in-r-using-ggplot2-part-6-weighted-scatterplots.html

The bubble plot above displays data for three quantitative variables plus a categorical variable. The x axis represents the day of the month and the y axis represents a measure of the air quality. The size of each bubble is the wind speed on that day. And, the color of the bubble represents the month. 

It looks like the pink bubbles, particularly starting on the sixth of the month, tend to be lower than the blue and green. This means that air pollution tends to be lower in September compared to July and August. The larger bubbles tend to be lower, representing lower air pollution on windier days.