Here are four different graphs that can be used to describe measurement data. These graphs include:
- Stemplots (Stem and Leaf Plot)
- Boxplots (Covered in section 3.4)
Example 3.4. Graphs Section
Consider the following sample.
Sample: The ages of forty selected PSU Tenured Faculty (n = 40 ages)
The dotplot is the first graph that will be used to display this sample of 40 ages. The purpose of the dotplot is to represent each observation as a dot. On this dotplot, you will find that the ages range from 32 to 63 years. You should also notice that there is more tenured faculty at older ages.
Figure 3.2. Dotplot (Ages of 40 PSU Tenured Faculty)
Stemplots (Stem and Leaf Plot)
The second graph that is possible with measurement data is a stemplot. Stemplots concisely display the data in order from smallest to largest. Below is the list of the 40 ages in order from youngest to oldest.
These forty observations are displayed in the stemplot found in Figure 3.3. In this stemplot, we again find that the range of ages spans from 32 years to 63 years. We also find that there is more tenured PSU faculty at older ages. Stemplots can provide useful information about small data sets.
Stem and Leaf Plot: Ages of Tenured PSU Faculty
(each row represents 5 years)
Stem-and-leaf of Ages N = 40
Leaf Unit = 1.0
1 & 3 & 2 \\
3 & 3 & 79 \\
6 & 4 & 023 \\
12 & 4 & 567789 \\
(10) & 5 & 0112333444 \\
18 & 5 & 55667888999 \\
7 & 6 & 0011233
The third graph is called a histogram. Of all the graphs presented so far, the histogram may be the most valuable. A histogram is essentially a bar graph for measurement data. The difference, however, between a histogram and a bar graph, is that with a histogram the categories are a range of numbers rather than words. Usually, each numerical category must have the same width. The heights of the bars either reflect the frequency or the relative frequency (percents) of encountering that range of numbers in the data
Recall that the ages span from 32 to 63 years. The range of these ages is (63-32) = 31 years. Looking at the stemplot, one finds that there is more tenured faculty at the older ages. We want to be able to show this trend. There need to be enough categories to properly display any trend. If we only choose 4 categories (since we have tenured faculty in their 30s, 40s, 50s, and 60s) we would not be able to detect this trend as well. If we choose 9 categories the trend will become more obvious. Statistical software will usually make this determination for you.
Width of each Category = Range/ number of categories = 31/9 = 3.4 (rounded to 4 years)
The starting point is always below our lowest observed value and the ending point is always above our highest observed value. For readability, it is also best to have the intervals start and end on whole numbers (or easy to digest numbers if the data are on a different scale). In this instance, our starting point of 30 years is below the lowest observed age which is 32 years. The ending point of 66 years is above our highest observed age which is 63 years. In order to make sure that an observation only falls into one category, we construct our 4 year categories as shown in Table 3.1 below. Our first category includes ages starting at 30 years and ending just below 34 years. An observed age of 34 is not included in the first category but rather is included in the second category. Using this method, no observed age can fall into more than one category. The observed ages are then placed into the appropriate category and the histogram is constructed.
|Ages that Fall into the Category||Number of observed Ages in that Category||Percents|
|1. 30 ≥ Age < 34||Ages 30 to 33||1||1/40 = .025 (2.5%)|
|2. 34 ≥ Age < 38||Ages 34 to 37||1||1/40 = .025 (2.5%)|
|3. 38 ≥ Age < 42||Ages 38 to 41||2||2/40 = .05 (5.0%)|
|4. 42 ≥ Age < 46||Ages 42 to 45||3||3/40 = .075 (7.5%)|
|5. 46 ≥ Age < 50||Ages 46 to 49||5||5/40 = .125 (12.5%)|
|6. 50 ≥ Age < 54||Ages 50 to 53||7||7/40 = .175 (17.5%)|
|7. 54 ≥ Age < 58||Ages 54 to 57||8||8/40 = .20 (20.0%)|
|8. 58 ≥ Age < 62||Ages 58 to 61||10||10/40 = .25 (25.0%)|
|9. 62 ≥ Age < 66||Ages 62 to 65||3||3/40 = .075 (7.5%)|
|n = 40||40/40 = 1.0 (100%)|
The information from Table 3.1 is used make the histogram found in Figure 3.4. The horizontal axis displays the categories while the vertical axis displays the percent of the observations (ages) found in each category. (Note: You will not be asked to make histograms. You will only be asked to interpret them. However, it is important to see how one is made so that you understand the interpretation.)XXXXXXXXXXXXXXXXXXXX
Figure 3.4. Histogram (Ages of PSU Tenured Faculty)
As you look at the histogram, you should notice that there are more ages in the upper half of the graph. In statistics, when data on a histogram is off-center, the data is labeled as skewed. In this case, the data is skewed to the left because a larger percent of the ages are found in the upper tail.
|Histogram Interpretations||Explanation||Graph Examples|
|Skewed to the right||
A larger percent of data is found on the lower tail of the histogram. Data values are farther apart on the right.
Common Examples: income data; variables that are ratios where the denominator could be small
|Skewed to the left||
A larger percent of data is found on the upper tail of the histogram. Data values are farther apart on the left.
Common Example: results of an easy test.
Equal percent of data on each side and tail of the histogram.
Common Example: differences between similar quantities, or sums of similar quantities
Two clear peaks (modes) seen in the histogram
Common Example: data really comes from two different populations