4.3 - Statistical Pictures

4.3 - Statistical Pictures

In this section, we examine a few important types of statistical pictures: bar graphs, time series plots, and scatterplots.

Before turning to these specific types of statistical pictures, it is important to note that regardless of the type of picture being used, there are some basic features that a good graph will possess:

  • The data should be clearly recognizable from the background
  • The picture should be clearly labeled, showing
    • the title and purpose or origin of the data,
    • what is being plotted on each axis, bar, or segment of the plot (i.e. the variables being presented)
    • the scale including starting points and units of the measurement
  • The picture should have as little extraneous material as possible (i.e. a high "information content")

Section 9.5 of the text (pages 190 to 194) provides illustrations ofthe difficulties that arise withpoor statistical graphics that do not follow these basic guidelines and create ambiguity and confusion in their interpretation..

Example 4.6: Life Satisfaction

The Gallup World Poll takes random samples of the adults in 132 different countries. In many of those countries, Gallup asks respondents to try and think about an overall evaluation of their lives and to specify how satisfied they are on a four-point scale (very dissatisfied, dissatisfied, satisfied, or very satisfied). Figure 4.6 shows a comparative bar graph giving the results from two surveys, one taking a random sample of adults in the United States and one using a random sample of adults in China. Bar graphs are often used to show the results of data for categorical variables and, as in Figure 4.6 can be used to compare categorical variables in two or more circumstances.

satisfaction with lives China and US

Figure 4.6 Life Satisfaction Results from Two Countries (Gallup World Poll)

Example 4.7 Consumer Spending

Each day the Gallup Poll takes a random sample of about 500 American adults nationally and asks them about a variety of issues including how much money they spent the day before not counting the purchase of a home or car or paying normal household bills like for electrical or phone service. Figure 4.7shows a line graph of the data presented in two ways: both as a 3-day rolling average (the dark green line) and as a 14-day rolling average (the light green line). For example, each point on the dark green line represents the average results of the amount spent by the 1500 American adults who had responded to the survey over the 3-day period leading up to the day of the survey. This type of line graph is called a time series plot because the points represent the variable being measured across time.

Gallup Poll results on consumer spending

Figure 4.7 Time Series Plot of Consumer Spending from February 2008 to February 2015 (Gallup Poll)

When looking at a time series plot like Figure 4.7 it is important to examine and interpret some key basic features:

  • Is there a long-term trend? For example, in the consumer spending data, we can see a long-term generally upward trend since the end of the recession in June 2009. One note of caution when looking at economic data that extends over decades in time; check if they have been adjusted for inflation. An apparent upward trend may be nothing more than reflecting a change in the value of the dollar.
  • Are there seasonal components? While temperature data is dramatically affected by regular seasonal cycles, many other variables change in predictable patterns because of people's behavioral changes in certain months or seasons. For example, have a close look at Figure 4.7. You should be able to see a bump in consumer spending each year associated with the holidays in December. There are other cyclic effects in this data. If you look really closely at the 3-day averages, you can see that there is increased spending on weekends compared with weekdays.
  • What is the nature of the random fluctuations? We know that every measurement is subject to natural variability and that averages will be more reliable if they are based on larger sample sizes. Have a look at Figure 4.7 and see how the 3-day averages based on surveys of about 1500 people show random fluctuations much larger than the 14-day averages based on surveys of about 7000 people.

Example 4.8 Blood Alcohol Content (BAC)

breathalyzer test

An experiment was carried out to see how Blood Alcohol Content (BAC) as measured by a breathalyzer change with the number of 12-ounce beers you drink (the experiment is discussed in the Electronic Encyclopedia of Statistics Examples and Exercises). In the experiment, 16 subjects each drew a number out of a hat. For example, if the number was a 3, then that subject drank 3 beers. A half-hour after finishing the last assigned beer a police officer used a breathalyzer, like the ones they use in the field, to measure the subject's BAC level. Figure 4.8 shows a scatterplot of the results. Each point represents a different subject. For example, one subject drank 6 beers and had a BAC of 0.10; over the legal limit for driving.

BAC versus number of beers scatterplot

Figure 4.8 Number of 12-ounce Beers Consumed versus BAC for 16 Subjects

Scatterplots are used for displaying the relationship between two measurement variables. Examining Figure 4.8, we can see a clear trend - there is an obvious positive association between the number of beers and the BAC - increases in one variable are associated with increases in the other. The data here were based on a randomized experiment and the causal nature of this particular relationship is quite well established. Positive associations are reflected in a cloud of points in the scatterplot that goes "uphill" as you move from left to right. A negative association, like the one we would see if we plotted the weights of cars versus their gas mileage, shows a cloud of points going "downhill".


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility