Many measurement variables found in nature follow a predictable pattern. The predictable pattern of interest is a type of symmetry where much of the distribution of the data is clumped around the center and few observations are found on the extremes. Data that has this pattern are said to be bell-shaped or have a normal distribution. It can be shown that variables that arise as a result of the sum or average of a fixed number of individual smaller components of a similar nature will have this shape. Thus, the distribution of the weights of cartons of large eggs at a grocery store will look like a normal curve because the weight of a carton arises from the sum of the weights of the dozen eggs inside. Many measures used by psychologists to gauge levels of characteristics like stress or anxiety or happiness are based on questionnaires that score your answers to lots of individual questions and then sum them up to get a final measure. The distributions of such measures within a homogeneous group of people will then approximately follow a normal curve
Example 4.3: Normal Curves Section
Consider the following three variables from data that was collected from a sample of n = 198 Stat 100 students:
- Variable #1: Heights (inches)
- Variable #2: Grade Point Average
- Variable #3: Number of Tattoos
Figure 4.1. Histogram of Height (Mean = 66.3 inches & Median = 66 inches)
The Heights Variable is a great example of a histogram that looks approximately like a normal distribution as shown in Figure 4.1. Since a normal distribution is a type of symmetric distribution, you would expect the mean and median to be very close in value. With this example, the mean is 66.3 inches and the median is 66 inches.
Figure 4.2. Histogram of GPA (Mean = 3.25 & Median = 3.3)
The GPA Variable that gives the Grade Point Averages of these 198 Stat 100 students is slightly skewed left and could only very roughly be said to follow a normal distribution as shown in Figure 4.2. Notice the upper tail where the data is clumped. This can be partially explained by the fact that GPAs at Penn State cannot exceed 4.0. However, the mean and median are still pretty close, and using the normal curve (to calculate percentiles for example) should give very rough approximations. It is likely that the GPA variable would look more like a normal curve if the data were restricted to a more homogeneous group with a similar number of credit hours taken.
Figure 4.3. Number of Tattoos (Mean = .23 & Median = 0)
The Tattoo Variable is not normally distributed at all as shown in Figure 4.3. The major problem with this variable is that it is extremely skewed to the right since most people have no tattoos at all. Also, the graph has gaps because this variable is discrete with only a few values in the data set. Thus, the normal curve should not be used to make even rough approximations for data about the number of tattoos.
The Empirical Rule Section
The empirical rule is a guideline that can be applied when you know that the sample is approximately normally distributed. The empirical rule also helps one to understand what the standard deviation represents.
The empirical rule says that for any normal (bell-shaped) curve, approximately:
- 68%of the values (data) fall within 1 standard deviation of the mean in either direction
- 95%of the values (data) fall within 2 standard deviations of the mean in either direction
- 99.7%of the values (data) fall within 3 standard deviations of the mean in either direction
Figure 4.4 The Empirical Rule
Example 4.4: Empirical Rule Section
Recall the variable heights used in Example 4.3. Since the histogram shows that this data is normally distributed, the empirical rule can be applied. The mean and standard deviation (SD) for this sample is 66.3 inches and 4 inches, respectively. Below are the calculations for the sample of heights.
Mean ± 1(SD) = 66.3 ± 4 inches = (62.3 to 70.3 inches)
Mean ± 2(SD) = 66.3 ± 2(4) inches = 66.3 ± 8 inches = (58.3 to 74.3 inches)
Mean ± 3(SD) = 66.3 ± 3(4) inches = 66.3 ± 12 inches = (54.3 to 78.3 inches)
Because the sample of heights is normally distributed, one can say that approximately
- 68% of the heights lie between 62.3 and 70.3 inches
- 95% of the heights lie between 58.3 and 74.3 inches
- 99.7% of the heights lie between 54.3 and 78.3 inches
One would expect it to be very unusual for someone in this sample to be smaller than 54.3 inches or taller than 78.3 inches. Since 68% of the heights are within one standard deviation of the mean, the remaining 32% would fall outside of that. Further, since the distribution is symmetric we would have 16% (half of the 32%) falling below 62.3 inches and another 16% falling above 70.3 inches.
An important feature of the normal curve is that percentiles are completely determined by the standardized scores. Table 8.1 on in Chapter 8 in the textbook (page 175) shows the standard scores that align with various percentiles. As examples, examine the table to check that the 23rd percentile goes with a standard score of z = -0.74 and the 97th percentile goes with a standard score of z = 1.88.
Example 4.5 Section
A histogram of the highway gas mileage for the 171 compact SUVs sold in the United States and tested by the EPA in 2013 is shown in Figure 4.5. The mean mileage was 22.20 mpg with a standard deviation of 2.85 mpg. General Motors' 2013 Encore compact SUV got 28 mpg. What percentage of the compact SUVs got worse mileage than the Encore?
Figure 4.5 Histogram of Highway Mileage for 2013 compact SUVs
To solve this, we first have to compute the standard score of the value of interest which is found by:
\(z=\dfrac{(28 - 22.2)}{2.85} ≈ 2.04\) (this says that 28 mpg is 2.04 standard deviations above the mean).
Next, we look at Table 8.1 to find that this standard score corresponds to approximately the 98th percentile of a normal distribution. Thus, the Encore gets better mileage than about 98% of the 2013 compact SUVs (and hence worse mileage than about 2% of them).