Lesson 3: Getting the Big Picture and SummariesLesson 3: Getting the Big Picture and Summaries
The goal of this lesson is to pull together what we have learned about appropriate ways to gather data and how the design of a study affects the way we interpret its results. Then, when the study is completed and we have the resulting data in hand, we must learn to summarize the salient features of those data through graphs and numerical summaries. The methods we will focus on are appropriate for all measurement variables regardless of whether the variable is discrete or continuous.
- Effectively critique reports about scientific studies and recognize the statistical issues involved.
- Interpret any of the four graphs used with measurement data.
- Interpret measures of center and measures of spread.
- Determine when sensitive statistics or resistant statistics should be used to describe a data set.
- Interpret a five-number summary.
3.1 - Reviewing Studies - Getting the Big Picture3.1 - Reviewing Studies - Getting the Big Picture
In January 2015 a report by the Pew Research Center commissioned by the American Association for the Advancement of Science (AAAS) compared the views of the American public and scientists belonging to the AAAS. Here is a link to a summary of the report "Public and Scientists’ Views on Science and Society". (note that the complete report is available as a link in the upper right-hand corner of that web page and another link provides the exact questions asked).
In January 2015 a report on ABC news discussed a study regarding the relationship between the ingredients in certain medicines and the onset of dementia in the elderly. Here is a link to the ABC report "Common Over-the-Counter Medicines Linked to Dementia in New Study".
In January 2015 a report on ScienceDaily discussed a study regarding whether memory might be associated with whether you have your eyes open or closed when you try to recall recent events. Here is a link to the report "Closing your eyes boosts memory recall, new study finds".
Hundreds of research studies are published each week and many of these find their way to become the basis of reports in the popular press or on websites devoted to providing science news to the public. In Lessons 1 and Lesson 2 we have seen how important it is to critically evaluate the process used to gather the data of the study as we decide on the veracity of the claims made.
In this lesson, we review this material and provide step-by-step guidelines for evaluating research studies like examples 3.1, 3.2, and 3.3.
Evaluating Research Studies
Step One: Determine the type of study conducted (e.g. sample survey, observational study, randomized experiment). Example 3.1 is based on data from a sample survey; example 3.2 is based on data from a prospective observational study, and example 3.3 is based on data from a randomized experiment.
Step Two: Determine the critical components of the research (See pages 20-23 in the text).Here you must seek answers to questions like who funded the research? Who were the subjects and how did they come to take part? What are the exact measurements made or questions asked? What was the research setting? What differences between comparison groups might exist besides the factor of interest? How big of an effect was seen and did the researchers claim it was statistically significant (unlikely to happen by chance)?
Step two lies at the heart of the material in Lessons 2 and 3. Have a look at the three examples above and judge for yourself whether understanding the critical components suggests potential sources of bias.
Step Three: Check the "Difficulties and Disasters" that we saw in earlier lessons.
1) For sample surveys
- check if a probability method was used in generating the sample
- check if the sampling frame was close to the population of interest
- check if there were difficulties in reaching certain parts of the sampling frame that could be related to what's being studied
- check the response rate and think about how non-respondents might differ from respondents
In example 3.1, there were two surveys - one of the public and one of the scientists. The "public" sample involved 2002 people chosen randomly using a stratified sample from a sampling frame that included American adults with access to a landline or cell phone. Interviews were conducted in either English or Spanish depending on the preference of the respondent. Because different demographic groups have very different response rates, the Pew Center weights the responses in a way to bring them into alignment with census data on the true proportions of the population for factors like age, gender, race, and education level. The exact response rates are not provided for this survey but are given in great detail by Pew for many studies. Typically they find that there is no answer for about 1/3 of the working phone numbers they dial and that about 60% of the people they contact refuse to participate and another 10% are ineligible (e.g. a child's phone). That leaves a response rate of about 20% of working phone numbers they call. This may seem low, but it is actually on the high end for the polling industry - that's why survey organizations spend so much effort in their methods for weighting the results given to adjust for ways that the respondents might be different from the non-respondents. The poll of scientists involved 3,748 scientists chosen randomly from the membership list of AAAS which includes about 150,000 scientists. About 19,000 members were sent e-mails to introduce the study so the response rate was around 20%. Pew again weighted the results to align the sampled scientists with the sampling frame according to membership status (graduate student member/active faculty/retired faculty, and a fellow of the society versus non-fellows). Two areas might be of concern here. First, we notice that people in the general public survey were contacted by phone while the scientists were initially contacted by e-mail. Second, the membership of AAAS (the sampling frame) may be different from the population of scientists in general (e.g. different fields are more heavily represented in AAAS than others).
2) For comparative observational studies
- check for possible confounding variables and whether claims of causation were made (ask: Is there an important factor related to what brought the subjects into the groups being compared?)
- check whether claims about generalizing the results to a larger population are appropriate
- check whether a retrospective or prospective design was used (if retrospective - is there a reliance on the accuracy of a subject's memory?)
In example 3.2 the data was gathered prospectively using data on prescription drug reimbursements from a large HMO in the Seattle, Washington area (note that the study was about prescription drugs even though the headlines mention over-the-counter drugs!). The prospective design and avoiding patient self-reports about drug use were strengths of this study (the accuracy of self-report data is always a concern). Generalizing the results to the public at large might be a concern if the elderly membership of this HMO differed significantly from the general elderly population in ways that might affect dementia. For example, having health insurance in the first place indicates an economic and educational status that might be associated with the onset of dementia.
3) For randomized experimental studies
- check for possible confounding variables when small samples are used (with large samples the randomization will help with this)
- check for interacting variables (ask: Do the results stay the same for different sub-populations?)
- check for placebo, Hawthorne, and experimenter effects (ask: Was the experiment double-blind? Would subjects behave differently because they are being studied?)
In example 3.3 the subjects could not be blinded as to whether they had their eyes closed or not, but the researchers evaluating how well they did in remembering the events they had seen were blinded. The task in this experiment involved watching a video showing an electrician stealing items from a job site. An important concern here might be the artificial nature of the setting in which the experiment was done (watching a video of a robbery compared to see a robbery in person for example). Would the results carry over to recalling important events in real life?
Step Four: If the information needed to critique is not provided in the report, try to find the original source and see if the missing information is provided there.
Each of the three examples here provides a direct link to the full original report although, for example, 3.3, the original scientific publication must be purchased. Example 3.2 is a good example to see how there can often be a good deal of difference between the information provided in the original scientific report and the information in a report in the popular press and especially in the headline used to promote the article.
Step Five: Do the results make sense? Are they based on a sound scientific footing?
For example, the link between long-term heavy cell phone use and brain cancer is very controversial. The association has been seen in a number of retrospective case-control studies but no laboratory experiments have been able to provide a biological mechanism for how a causal link might happen. Perhaps the retrospective studies are flawed because the memories of patients with brain tumors about their cell phone use years earlier differ in accuracy from the memories of the control groups studied.
Step Six: Ask if there is an alternative explanation for the results?
In example 3.1 some of the issues on which public opinion is seen to be quite different from the opinion of scientists involve public policy controversies. Since opinions on such topics are strongly associated with political beliefs - it is possible that political affiliations are a confounding variable that explains some part of the differences seen.
Step Seven: Decide if the results are strong enough to encourage you to change your behavior or beliefs.
Does example 3.1 suggest to you that there is a greater need for better science education in the United States? Would example 3.2 lead you to examine the ingredients of prescription drugs you use and seek alternatives if you spotted the ingredient in the study? Would example 3.3 encourage you to ask someone to close their eyes in order for them to recall a memory you are asking them about?
3.2 - Graphs: Displaying Measurement Data3.2 - Graphs: Displaying Measurement Data
Example 3.4. Graphs
Consider the following sample.
Sample: The ages of forty selected PSU Tenured Faculty (n = 40 ages)
The dotplot is the first graph that will be used to display this sample of 40 ages. The purpose of the dotplot is to represent each observation as a dot. On this dotplot, you will find that the ages range from 32 to 63 years. You should also notice that there is more tenured faculty at older ages.
Figure 3.2. Dotplot (Ages of 40 PSU Tenured Faculty)
Stemplots (Stem and Leaf Plot)
The second graph that is possible with measurement data is a stemplot. Stemplots concisely display the data in order from smallest to largest. Below is the list of the 40 ages in order from youngest to oldest.
These forty observations are displayed in the stemplot found in Figure 3.3. In this stemplot, we again find that the range of ages spans from 32 years to 63 years. We also find that there is more tenured PSU faculty at older ages. Stemplots can provide useful information about small data sets.
Figure 3.3. Stemplot (Ages of PSU Tenured Faculty)
The third graph is called a histogram. Of all the graphs presented so far, the histogram may be the most valuable. A histogram is essentially a bar graph for measurement data. The difference, however, between a histogram and a bar graph, is that with a histogram the categories are a range of numbers rather than words. Usually, each numerical category must have the same width. The heights of the bars either reflect the frequency or the relative frequency (percents) of encountering that range of numbers in the data
Recall that the ages span from 32 to 63 years. The range of these ages is (63-32) = 31 years. Looking at the stemplot, one finds that there is more tenured faculty at the older ages. We want to be able to show this trend. There need to be enough categories to properly display any trend. If we only choose 4 categories (since we have tenured faculty in their 30s, 40s, 50s, and 60s) we would not be able to detect this trend as well. If we choose 9 categories the trend will become more obvious. Statistical software will usually make this determination for you.
Width of each Category = Range/ number of categories = 31/9 = 3.4 (rounded to 4 years)
The starting point is always below our lowest observed value and the ending point is always above our highest observed value. For readability, it is also best to have the intervals start and end on whole numbers (or easy to digest numbers if the data are on a different scale). In this instance, our starting point of 30 years is below the lowest observed age which is 32 years. The ending point of 66 years is above our highest observed age which is 63 years. In order to make sure that an observation only falls into one category, we construct our 4 year categories as shown in Table 3.1 below. Our first category includes ages starting at 30 years and ending just below 34 years. An observed age of 34 is not included in the first category but rather is included in the second category. Using this method, no observed age can fall into more than one category. The observed ages are then placed into the appropriate category and the histogram is constructed.
Table 3.1. Summary of Ages for 40 PSU Tenured Faculty
Ages that Fall into the Category
Number of observed Ages in that Category
1. 30 ≥ Age < 34
Ages 30 to 33
1/40 = .025 (2.5%)
2. 34 ≥ Age < 38
Ages 34 to 37
1/40 = .025 (2.5%)
3. 38 ≥ Age < 42
Ages 38 to 41
2/40 = .05 (5.0%)
4. 42 ≥ Age < 46
Ages 42 to 45
3/40 = .075 (7.5%)
5. 46 ≥ Age < 50
Ages 46 to 49
5/40 = .125 (12.5%)
6. 50 ≥ Age < 54
Ages 50 to 53
7/40 = .175 (17.5%)
7. 54 ≥ Age < 58
Ages 54 to 57
8/40 = .20 (20.0%)
8. 58 ≥ Age < 62
Ages 58 to 61
10/40 = .25 (25.0%)
9. 62 ≥ Age < 66
Ages 62 to 65
3/40 = .075 (7.5%)
n = 40
40/40 = 1.0 (100%)
The information from Table 3.1 is used make the histogram found in Figure 3.4. The horizontal axis displays the categories while the vertical axis displays the percent of the observations (ages) found in each category. (Note: You will not be asked to make histograms. You will only be asked to interpret them. However, it is important to see how one is made so that you understand the interpretation.)
Figure 3.4. Histogram (Ages of PSU Tenured Faculty)
As you look at the histogram, you should notice that there are more ages in the upper half of the graph. In statistics, when data on a histogram is off-center, the data is labeled as skewed. In this case, the data is skewed to the left because a larger percent of the ages are found in the upper tail.
|Histogram Interpretations||Explanation||Graph Examples|
|Skewed to the right||
A larger percent of data is found on the lower tail of the histogram. Data values are farther apart on the right.
Common Examples: income data; variables that are ratios where the denominator could be small
|Skewed to the left||
A larger percent of data is found on the upper tail of the histogram. Data values are farther apart on the left.
Common Example: results of an easy test.
Equal percent of data on each side and tail of the histogram.
Common Example: differences between similar quantities, or sums of similar quantities
Two clear peaks (modes) seen in the histogram
Common Example: data really comes from two different populations
3.3 - Numbers: Summarizing Measurement Data3.3 - Numbers: Summarizing Measurement Data
We will discuss two important ways to summarize measurement data. These include:
- measures of center (where the data are located along the number line)
- measure of spread (how much variation there is about the center)
To represent the center of a list of measurement data we focus on:
- The mean (the numerical average)
- The median (the 50th percentile)
Example 3.5: Measures of Center
Consider the following sample for 5 Selected PSU Students (n = 5)
Suppose you want to find a number to represent the center of the data. The first choice would be the mean. The mean is also known as the average. The mean is found by obtaining a sum of all the observations and dividing by the sample size (n). In this instance:
mean = (1 + 5 + 1 + 4 + 2) / 5 = 13 / 5 = 2.6 movie rentals/month
Another possibility is the median. The median is the middle value of a sample when the observations are sorted from smallest to largest.
In this example, the middle observation is 2 so the median = 2.0 movie rentals/month.
As you examine how the mean and median were calculated, hopefully, you notice that the two methods are very different. The mean is an example of a sensitive measure because all observations were used in the calculation so it is sensitive to large or small numbers away fro most of the other values. In contrast, the median is an example of a resistant measure because only the middle observation was used to determine its value.
Example 3.6: Which Measure of Center to Use
Consider the following sample of Annual Salaries for 20 Selected Employees at a Local Company
The mean for this sample is \$45,000 while the media is \$40,000. (Note: because the sample size is an even number, the median is the average of the middle two numbers, which in this case are \$38,000 and \$42,000). Even though we can always determine both the mean and median, one must determine which measure is more appropriate to use when there is a large difference between the two measures of center. In this instance, there is a difference of \$5,000 between the two measures, so one should decide which measure of center is more appropriate to use. To help you understand what is happening, look at the histogram found below.
Figure 3.5. Histogram (Salaries)
As you can see, the histogram is right-skewed because a larger percentage of the salaries are located on the lower tail. The very large salary of \$110,000 is largely responsible for the histogram is right-skewed. With right-skewed histograms, the mean will be greater than the median, because the mean is sensitive to the large salary of \$110,000 and is pulled in the direction of the unusually large observation. In contrast, the median, which is the middle value of the data set, is resistant to any extreme observations because these observations are not used to determine its value. Table 3.3 summarizes the link between the two measures of center and histogram shape.
Table 3.3. Link between Measures of Center and Histogram Shape
|Histogram Shape||Compare Two Measures Of Centers|
|If symmetric||mean and median are approximately equal|
|If right skewed||mean is greater than the median|
|If left skewed||mean is less than median|
Figure 3.6. Different Distributions
So, getting back to the question of which measure of center is more appropriate to use. When you have skewed data, the mean is somewhat misleading as a representative value. The mean can be pulled in one direction or the other by outliers. Generally, when the data is skewed, the median is more appropriate to use as the measure of a typical value. We generally use the mean as the measure of center when the data is fairly symmetric. In deciding which measure to use, we must also confront the issue of validity - that is what is most relevant for the problem at hand. For example, if we are interested in the total income for a country, we would look to per-capita income (the mean). But if we are interested in the income of a typical citizen, we would look to the median income.
It is often important to be given both measures of center. For example, the difference between the mean and median is important since the direction and magnitude of that difference helps a person envision the likely shape of the histogram as indicated in Table 3.3. and in the plots shown in Figure 3.6.
As stated above, the question being asked can also affect which measure of the center can be considered more typical and therefore, more appropriate. Although we would normally use the median with skewed data, there may be cases where we might use the mean as a more typical measure of center. It all depends on the question being asked and on the shape of the data. For example, given the right-skewed data for the company in Example 3.6:
- If you want to know how much employees at the company in Example 3.6 pay in social security taxes the mean might represent the typical salary figure better than the median since it accounts for the total pay on which the taxes are based. In this case, the median salary figure may not be as appropriate as the mean salary figure.
- However, if you are applying for an entry-level position within the company in Example 3.6, and want to know what a typical employee makes, the median salary figure would represent the typical salary figure better than the mean and be more appropriate to use.
3.4 - Five Useful Numbers (Percentiles)3.4 - Five Useful Numbers (Percentiles)
A percentile is the position of an observation in the dataset relative to the other observations in the data set. Specifically, the percentile represents the percentage of the sample that falls below this observation. For example, the median is also known as the
50th percentile because half of the data or 50% of the observations lie below the median. Table 3.4 displays three percentiles that will be of interest to us. Figure 3.7 shows these percentiles (quartiles) graphically.
Table 3.4. Percentiles of Interest
||25% of the data falls below this percentile|
||50% of the data falls below this percentile|
||75% of the data falls below this percentile|
Figure 3.7. Quartiles for a Distribution
A five-number summary is a useful summary of a data set that is partially based on selected percentiles. Below are the five numbers that are found in a five-number summary.
Figure 3.8. Five-Number Summary
Example 3.7. Five-Number Summary
Recall the sample that was used in the previous example.
Sample: The Annual Salaries for 20 Selected Employees at a Local Company
Table 3.5. Five-Number Summary of Salaries
|Lowest||Lower Quartile (QL)||Median||Upper Quartile (QU)||Highest|
Below are possible questions that can be answered with this five number summary.
5 Number Summary
What percent of the salaries lie below $49,000?
Reason: $49,000 represents the 75th percentile or upper quartile
What percent of the salaries lie above $40,000?
Reason: $40,000 represents the 50th percentile so 50% of the observations lie below this percentile and 50% lie above this percentile
What percent of the salaries lie between 33,500 and 49,000 dollars?
Reason: asking for percent of observations that lies between the 25th percentile and the 75th percentile (75% - 25% = 50%)
The five-number summary is also of value because it is the basis of the boxplot. Figure 3.9 below is a vertical boxplot of the variable salaries. The first thing to consider in this graph is the box. The ends of the box locate the lower quartile and upper quartile, which in this case are 33,500 and 49,000 dollars respectively. The line in the middle of the box is the median. As you examine the box portion of the box, you should notice that the median is closer to the lower quartile than to the upper quartile. This suggests that data set is skewed and specifically skewed to the right. In this instance the largest observation is represented with an asterisk. Since this observation is an unusually large salary of $110,000, the graph identifies this observation as an outlier or unusual observation. Appropriate statistical criterion is used to determine whether or not an observation is an outlier. Lines called 'whiskers' extend from the box out to the lowest and highest observations that are not outliers. Notice that the whisker on the bottom is much shorter than the whisker on the top of this boxplot. This is another hallmark of a distribution that is skewed to the right (because the first 25% of the data covers a narrow length on the number line while the last 25% are more spread out.
Figure 3.9. Horizontal Boxplot of Salaries
One of the most important uses of the boxplot is to compare two or more samples of one measurement variable.
Example 3.8. Using Boxplots for Comparisons
Recall Example 1.7 from Lesson 1. Consider two different wordings for a particular question:
Wording 1: Knowing that the population of the U.S is 270 million, what is the population of Canada?
Wording 2: Knowing that the population of Australia is 15 million, what is the population of Canada?
The results from these questions are displayed on side-by-side boxplots found in Figure 3.10.
Figure 3.10. Boxplots of Canada's Population by Wording
Four comparisons can be made with side-by-side boxplots. One can then compare the
- centers: medians
- amount of spread (variation): lengths of the box
- shape: the position of the median in the box relative to the quartiles shows whether the data are skewed left, skewed right, or symmetric
- number of outliers
With this example, the median for those who had Wording 1 is larger than the median found with Wording 2. One also finds that the length of the box for Wording 1 is also larger than that found with Wording 2. This suggests that there is more spread or variation in the responses for Wording 1. The median is also not positioned in the same place in each box that indicates that the two samples do not have the same shape. Finally, there are two outliers with Wording 2 while there are none with Wording 1. Overall, these findings suggest that the wording of the question does affect the responses that are obtained.
While boxplots do not show the whole distribution like a histogram they are particularly useful for comparing groups since they are thin graphs that can easily be laid side-by-side. However, they have limits. They can not show if a distribution is bimodal or if there are spikes in the histogram at selected values. For example, if you ask a group of adults their heights you might see a bimodal distribution arising from the heights of women in a group with a lower peak and the heights of the men in an overlapping group with a higher peak. The tendency for people to round off that creates spikes in the histogram would not show up in a box plot of the same data (for example many men often say they are six feet tall when they are really 5'11").
3.5 - Measures of Spread or Variation3.5 - Measures of Spread or Variation
Two ways to represent the spread or variation are:
- Interquartile Range (IQR)
- Standard Deviation (SD)
Example 3.9. Measures of Spread or Variation
Recall the five-number summary from Example 3.7.
Table 3.6. Five-Number Summary of Salaries
|Lowest||Lower Quartile (QL)||Median||Upper Quartile (QU)||Highest|
With the five-number summary one can easily determine the Interquartile Range (IQR). The IQR = QU - QL. In our example,
IQR = QU - QL = \$49,500 - \$33,250 = \$16,250
What does this IQR represent? With this example, one can say that the middle 50% of the salaries spans \$16,250 (or spans from \$33,250 to \$49,500). The IQR is the length of the box on a boxplot. Notice that only a few numbers are needed to determine the IQR and those numbers are not the extreme observations that may be outliers. The IQR is a type of resistant measure.
The second measure of spread or variation is called the standard deviation (SD). The standard deviation is roughly the typical distance that the observations in the sample fall from the mean (as a rule of thumb about 2/3 of the data fall within one standard deviation of the mean). The standard deviation is calculated using every observation in the data set. Consequently, it is called a sensitive measure because it will be influenced by outliers. The standard deviation for the variable "salaries" is \$17,936 (Note: you will not be asked to calculate an SD - that is done using calculators or computer software). What does the standard deviation represent? With this example, one can say that the typical distance of any individual salary from the mean salary of \$45,000 is about \$17,936. Figure 3.11 shows how far each individual salary is from the mean.
Figure 3.11. Dotplot of Salaries
What you notice in Figure 3.11 is that many of the observations are reasonably close to the sample mean. But since there is an outlier of \$110,000 in this sample, the standard deviation is inflated such that average distance is about \$17,936. In this instance, the IQR is the preferred measure of spread because the sample has an outlier.
Table 3.7 shows the numbers that can be used to summarize measurement data.
Table 3.7. Numbers used to Summarize Measurement Data
|Numerical Measure||Sensitive Measure||Resistant Measure|
|Measure of Center||Mean||Median|
|Measure of Spread (Variation)||Standard Deviation (SD)||Interquartile Range (IQR)|
- If a sample has outliers and/or skewness, resistant measures are preferred over sensitive measures. This is because sensitive measures tend to overreact to the presence of outliers.
- If a sample is reasonably symmetric, sensitive measures should be used. It is always better to use all of the observations in the sample when there are no problems with skewness and/or outliers.
3.6 - Test Yourself!3.6 - Test Yourself!
Think About It!
Select the answer you think is correct - then click the 'Check' button to see how you did.
Click the right arrow to proceed to the next question. When you have completed all of the questions you will see how many you got right and the correct answers.
3.7 - Have Fun With It!3.7 - Have Fun With It!
Have Fun With It!
J.B. Landers ©
Lyric © by Alan Reifman
may sing to the tune of "Big Shot" (Billy Joel)
You've collected your data, and you want to see,
The major cuts, from the symbolic axe,
You see the median... the quartiles,
And the min, and the max,
Well, you can also make a marking,
For a special case,
If a point looks, too far away,
You can handle outliers, fine,
Right within your array,
And you can show it with a box plot, can't you?
Tukey's gift to help explore,
You show it with a box plot, don't you?
Such a graph, you can't ignore,
They illustrate dispersion, quicker,
The key cut-points, at a glance,
It gives you a descriptive picture,
It looks just like a kitten's whisker,
The Statistics Song
Song © Science Court, Inc.
Numbers and surveys, they can tell us a lot,
When they're honest and fair...but sometimes they're not.
Check the size of a sample and they way it was picked,
Do the questions seem fair? Did the people seem tricked?
Chorus: Every day with more statistics showing up in the news you
Will see pretty graphs and fancy charts to amuse you.
But how was all the information found and then collected?
Look out for fancy numbers that are unconnected.