On the last page, we learned how to determine the first quartile, the median, and the third quartile for a sample of data. These three percentiles, along with a data set's minimum and maximum values, make up what is called the five-number summary. One nice way of graphically depicting a data set's five-number summary is by way of a box plot (or box-and-whisker plot).
Here are some general guidelines for drawing a box plot:
- Draw a horizontal axis scaled to the data.
- Above the axis, draw a rectangular box with the left side of the box at the first quartile \(q_1\) and the right side of the box at the third quartile \(q_3\).
- Draw a vertical line connecting the lower and upper horizontal lines of the box at the median \(m\).
- For the left whisker, draw a horizontal line from the minimum value to the midpoint of the left side of the box.
- For the right whisker, draw a horizontal line from the maximum value to the midpoint of the right side of the box.
Drawn as such, a box plot does a nice job of dividing the data graphically into fourths. Note, for example, that the horizontal length of the box is the interquartile range IQR, the left whisker represents the first quarter of the data, and the right whisker represents the fourth quarter of the data.
Example 13-3 Revisited Section
Let's return to our random sample of 64 people selected to take the Stanford-Binet Intelligence Test. The resulting 64 IQs were sorted as follows:
68 | 75 | 78 | 83 | 85 | 85 | 85 | 86 | 86 | 87 |
87 | 88 | 90 | 91 | 91 | 91 | 91 | 93 | 93 | 93 |
94 | 94 | 94 | 96 | 96 | 97 | 98 | 98 | 99 | 99 |
99 | 99 | 100 | 101 | 101 | 102 | 102 | 104 | 104 | 105 |
105 | 105 | 106 | 106 | 106 | 107 | 107 | 107 | 107 | 107 |
108 | 109 | 110 | 110 | 111 | 114 | 116 | 116 | 117 | 122 |
123 | 128 | 136 | 141 |
We previously determined that the first quartile is 91, the median is 99.5, and the third quartile is 107. The interquartile range IQR is 16. Use these numbers, as well as the minimum value (68) and maximum value (141) to create a box plot of these data.
Solution
By following the guidelines given above, a hand-drawn box plot of these data looks something like this:
In reality, you will probably almost always want to use a statistical software package, such as Minitab, to create your box plots. If we ask Minitab to create a box plot for this data set, this is what we get:
Hmm. How come Minitab's box plot looks different than our box plot? Well, by default, Minitab creates what is called a modified box plot. In a modified box plot, the box is drawn just as in a standard box plot, but the whiskers are defined differently. For a modified box plot, the whiskers are the lines that extend from the left and right of the box to the adjacent values. The adjacent values are defined as the lowest and highest observations that are still inside the region defined by the following limits:
- Lower Limit: \(Q1-1.5\times IQR\)
- Upper Limit: \(Q3+1.5\times IQR\)
In this example, the lower limit is calculated as \(Q1-1.5\times IQR=91-1.5(16)=67\). Therefore, in this case, the lower adjacent value turns out to be the same as the minimum value, 68, because 68 is the lowest observation still inside the region defined by the lower bound of 67. Now, the upper limit is calculated as \(Q3+1.5\times IQR=107+1.5(16)=131\). Therefore, the upper adjacent value is 128, because 128 is the highest observation still inside the region defined by the upper bound of 131. In general, values that fall outside of the adjacent value region are deemed outliers. In this case, the IQs of 136 and 141 are greater than the upper adjacent value and are thus deemed as outliers. In Minitab's modified box plots, outliers are identified using asterisks.
Example 13-4 Revisited Section
Let's return to the example in which we have a random sample of 20 concentrations of calcium carbonate (\(CaCO_3\)) in milligrams per liter:
130.8 | 129.9 | 131.5 | 131.2 | 129.5 | 132.7 | 131.5 | 127.8 | 133.7 |
132.2 | 134.8 | 131.7 | 133.9 | 129.8 | 131.4 | 128.8 | 132.7 | 132.8 |
131.4 | 131.3 |
With a little bit of work, it can be shown that the five-number summary is as follows:
- Minimum: 127.8
- First quartile: 130.12
- Median: 131.45
- Third quartile: 132.70
- Maximum: 134.8
Use the five-number summary to create a box plot of these data.
Solution
By following the guidelines given above, a hand-drawn box plot of these data looks something like this:
In this case, the interquartile range IQR \(132.7-130.12-2.58\). Therefore, the lower limit is calculated as \(Q1-1.5\times IQR=130.12-1.5(2.58)=126.25\). Therefore, the lower adjacent value is the same as the minimum value, 127.8, because 127.8 is lowest observation still inside the region defined by the lower bound of 126.25. The upper limit is calculated as \(Q3+1.5\times IQR=132.7+1.5(2.58)=136.57\). Therefore, the upper adjacent value is the same as the maximum value, 134.8, because 134.8 is the highest observation still inside the region defined by the upper bound of 136.57. Because the lower and upper adjacent values are the same as the minimum and maximum values, respectively, the box plot looks the same as the modified box plot