Example 13-3 Section
A random sample of 64 people were selected to take the Stanford-Binet Intelligence Test. After each person completed the test, they were assigned an intelligence quotient (IQ) based on their performance on the test. The resulting 64 IQs are as follows:
111 | 85 | 83 | 98 | 107 | 101 | 100 | 94 | 101 | 86 |
105 | 122 | 104 | 106 | 90 | 123 | 102 | 107 | 93 | 109 |
141 | 86 | 91 | 88 | 98 | 128 | 93 | 114 | 87 | 116 |
99 | 94 | 94 | 406 | 436 | 402 | 75 | 96 | 78 | 116 |
107 | 106 | 68 | 104 | 91 | 87 | 105 | 97 | 110 | 91 |
107 | 107 | 85 | 117 | 93 | 108 | 91 | 110 | 105 | 99 |
85 | 99 | 99 | 96 |
Once the data are obtained, it might be nice to summarize the data. We could, of course, summarize the data using a histogram. One primary disadvantage of using a histogram to summarize data is that the original data aren't preserved in the graph. A stem-and-leaf plot, on the other hand, summarizes the data and preserves the data at the same time.
The basic idea behind a stem-and-leaf plot is to divide each data point into a stem and a leaf. We could divide our first data point, 111, for example, into a stem of 11 and a leaf of 1. We could divide 85 into a stem of 8 and a leaf of 5. We could divide 83 into a stem of 8 and a leaf of 3. And so on. To create the plot then, we first create a column of numbers containing the ordered stems. Our IQ data set produces stems 6, 7, 8, 9, 10, 11, 12, 13, and 14. Once the column of stems are written down, we work our way through each number in the data set, and write its leaf in the row headed by its stem.
Here's what the our stem-and-leaf plot would look like after adding the first five numbers 111, 85, 83, 98, and 107:
6 | |
---|---|
7 | |
8 | 53 |
9 | 8 |
10 | 7 |
11 | 1 |
12 | |
13 | |
14 |
and here's what the completed stem-and-leaf plot would look like after adding all 64 leaves to the nine stems:
6 | 8 |
---|---|
7 | 58 |
8 | 536687755 |
9 | 84031839446171319996 |
10 | 71015462796276457785 |
11 | 1466070 |
12 | 238 |
13 | 6 |
14 | 1 |
Now, rather than looking at a list of 64 unordered IQs, we have a nice picture of the data that quite readily tells us that:
- the distribution of IQs is bell-shaped
- most of the IQs are in the 90s and 100s
- the smallest IQ in the data set is 68, while the largest is 141
That's all well and good, but we could do better. First and foremost, no one in their right mind is going to want to create too many of these stem-and-leaf plots by hand. Instead, you'd probably want to let some statistical software, such as Minitab or SAS, do the work for you. Here's what Minitab's stem-and-leaf plot of the 64 IQs looks like:
Stem-and-Leaf of IQ | N=64 | |
---|---|---|
Leaf Unit = 1.0 | ||
1 | 6 | 8 |
1 | 6 | 8 |
1 | 7 | |
3 | 7 | 58 |
4 | 8 | 3 |
12 | 8 | 55566778 |
23 | 9 | 01111333444 |
32 | 9 | 667889999 |
32 | 10 | 0112244 |
25 | 10 | 5556667777789 |
12 | 11 | 0014 |
8 | 11 | 667 |
5 | 12 | 23 |
3 | 12 | 8 |
2 | 13 | 6 |
1 | 14 | 1 |
Hmmm.... how does the plot differ from ours? First, Minitab tells us that there are n = 64 numbers and that the leaf unit is 1.0. Then, ignoring the first column of numbers for now, the second column contains the stems from 6 to 14. Note, though, that Minitab uses two rows for each of the stems 7, 8, 9, 10, 11, 12, and 13. Minitab takes an alternative here that we could have taken as well. When you opt to use two rows for each stem, the first row is reserved for the leaves 0, 1, 2, 3, and 4, while the second row is reserved for the leaves 5, 6, 7, 8, and 9. For example, note that the first 9 row contains the 0 to 4 leaves, while the second 9 row contains the 5 to 9 leaves. The decision to use one or two rows for the stems depends on the data. Sometimes the one row per stem option produces the better plot, and sometimes the two rows per stem plot option produces the better plot.
Do you notice any other differences between Minitab's plot and our plot? Note that the leaves in Minitab's plot are ordered. That's right... Minitab orders the data before producing the plot, and thereby creating what is called an ordered stem-and-leaf plot.
Now, back to that first column of numbers appearing in Minitab's plot. That column contains what are called depths. The depths are the frequencies accumulated from the top of the plot and the bottom of the plot until they converge in the middle. For example, the first number in the depths column is a 1. It comes from the fact that there is just one number in the first (6) stem. The second number in the depths column is also a 1. It comes from the fact that there is 1 leaf in the first (6) stem and 0 leaves in the second (the first 7) stem, and so 1 + 0 = 1. The third number in the depths column is a 3. It comes from the fact that there is 1 leaf in the first (6) stem, 0 leaves in the second (the first 7) stem, and 2 leaves in the third (the second 7) stem, and so 1 + 0 + 2 = 3. Minitab continues accumulating numbers down the column until it reaches 32 in the last 9 stem. Then, Minitab starts accumulating from the bottom of the plot. The 5 in the depths column comes, for example, from the fact that there is 1 leaf in the last (14) stem, 1 leaf in the second 13 stem, 0 leaves in the first 13 stem, 1 leaf in the second 12 stem, and 2 leaves in the first 12 stem, and so 1 + 1+ 0 + 1 + 2 = 5.
Let's take a look at another example.
Example 13-4 Section
Let's consider a random sample of 20 concentrations of calcium carbonate (\(CaCO_3\)) in milligrams per liter.
130.8 | 129.9 | 131.5 | 131.2 | 129.5 | 132.7 | 131.5 | 127. | 133.7 |
132.2 | 134.8 | 131.7 | 133.9 | 129.8 | 131.4 | 12.8 | 132.7 | 132.8 |
131.4 | 131.3 |
Create a stem-and-leaf plot of the data.
Solution
Let's take the efficient route, as most anyone would likely be taken in practice, by letting Minitab generate the plot for us:
Minitab tells us that the leaf unit is 0.1, so that the stem of 127 and leaf of 8 represents the number 127.8. The depths column contains something a little different here, namely the 7 with parentheses around it. It seems that Minitab's algorithm for calculating the depths differs a bit here. It still accumulates the values from the top and the bottom, but it stops in each direction when it reaches the row containing the middle value (median) of the sample. The frequency of that row containing the median is simply placed in parentheses. That is, the median of the 20 numbers is 131.45. Therefore, because the 131 stem contains 7 leaves, the depths column for that row contains a 7 in parentheses.
In our previous example, the median of the 64 IQs is 99.5. Because 99.5 falls between two rows of the display, namely between the stems 99 and 100, Minitab calculates the depths instead as described in that example, and omits the whole "parentheses around the frequency of the median row" thing.