Two ways to represent the spread or variation are:
- Interquartile Range (IQR)
- Standard Deviation (SD)
Example 3.9. Measures of Spread or Variation
Recall the five-number summary from Example 3.7.
Table 3.6. Five-Number Summary of Salaries
|Lowest||Lower Quartile (QL)||Median||Upper Quartile (QU)||Highest|
With the five-number summary one can easily determine the Interquartile Range (IQR). The IQR = QU - QL. In our example,
IQR = QU - QL = \$49,500 - \$33,250 = \$16,250
What does this IQR represent? With this example, one can say that the middle 50% of the salaries spans \$16,250 (or spans from \$33,250 to \$49,500). The IQR is the length of the box on a boxplot. Notice that only a few numbers are needed to determine the IQR and those numbers are not the extreme observations that may be outliers. The IQR is a type of resistant measure.
The second measure of spread or variation is called the standard deviation (SD). The standard deviation is roughly the typical distance that the observations in the sample fall from the mean (as a rule of thumb about 2/3 of the data fall within one standard deviation of the mean). The standard deviation is calculated using every observation in the data set. Consequently, it is called a sensitive measure because it will be influenced by outliers. The standard deviation for the variable "salaries" is \$17,936 (Note: you will not be asked to calculate an SD - that is done using calculators or computer software). What does the standard deviation represent? With this example, one can say that the typical distance of any individual salary from the mean salary of \$45,000 is about \$17,936. Figure 3.11 shows how far each individual salary is from the mean.
Figure 3.11. Dotplot of Salaries
What you notice in Figure 3.11 is that many of the observations are reasonably close to the sample mean. But since there is an outlier of \$110,000 in this sample, the standard deviation is inflated such that average distance is about \$17,936. In this instance, the IQR is the preferred measure of spread because the sample has an outlier.
Table 3.7 shows the numbers that can be used to summarize measurement data.
Table 3.7. Numbers used to Summarize Measurement Data
|Numerical Measure||Sensitive Measure||Resistant Measure|
|Measure of Center||Mean||Median|
|Measure of Spread (Variation)||Standard Deviation (SD)||Interquartile Range (IQR)|
- If a sample has outliers and/or skewness, resistant measures are preferred over sensitive measures. This is because sensitive measures tend to overreact to the presence of outliers.
- If a sample is reasonably symmetric, sensitive measures should be used. It is always better to use all of the observations in the sample when there are no problems with skewness and/or outliers.