8.2 - The Normal Approximation

8.2 - The Normal Approximation

While the behavior of small samples is unpredictable, the behavior of large samples is not. Statistical summaries like proportions and means arising from random samples tend to hone in on the true population value. Further, as we saw in figure 8.1, a frequency curve (probability histogram) showing the distribution of a sample proportion or mean across all possible samples follows the normal curve.

Here's the rule:

Normal Approximation:

The sampling distribution of averages or proportions from a large number of independent trials approximately follows the normal curve. The expectation of a sample proportion or average is the corresponding population value.

The standard deviation of a sample mean is:

\(\dfrac{\text{population standard deviation}}{\sqrt{n}} = \dfrac{\sigma}{\sqrt{n}}\)

The standard deviation of a sample proportion is:

\(\sqrt{\dfrac{\text{population proportion}(1-\text{population proportion})}{n}} =\sqrt{\dfrac{p(1−p)}{n}}\)

Example 8.2

The amount of gas purchased by customers at a gas station averages 12 gallons with a standard deviation of 5 gallons. The average amount purchased by the next 100 customers is then around \(\mu\) = 12 gallons with a standard deviation of about \(\frac{\text{population standard deviation}}{\sqrt{n}} = \frac{5}{\sqrt{100}} = 0.5\) (interpretation: 5 gallons gives the variation from customer to customer; 0.5 gallons gives the variation in the average purchase of 100 customers).

The chance that the next 100 customers average between 11 and 13 gallons is then about 95% (using the empirical rule since between 11 and 13 gallons corresponds to being within two standard deviations of what's expected). This calculation makes the reasonable assumption that the amount of gas purchased by one customer is independent of the amount purchased by another customer.

Example 8.3

Mitt Romney and Barack Obama shaking hands

In the 2012 Presidential Election, President Obama received 52% of the vote in Pennsylvania. On the day of the election the outcome in Pennsylvania was important to the national election outcome so before all of the votes were counted, several pollsters conducted "exit polls" to gauge how the vote turned out and the reasons why people voted as they did. Suppose you conduct an exit poll of 1000 Pennsylvania voters leaving their precinct voting stations or after they had voted by mail. What is the probability that a majority of your sample did not vote for President Obama?

Solution

We know the true population proportion is p = 0.52. So the question is asking about the chances that the sample proportion would come out less than 0.5.

The standard deviation of would be:

\(\sqrt{\dfrac{p(1−p)}{n}} = \sqrt{\dfrac{0.52(0.48)}{1000}}=0.0158\)

Since the population situation is roughly symmetric (0.52 versus 0.48) the distribution of the sample proportion would follow the normal curve. Thus to compute the probability, we calculate the standard score...

\(z = \dfrac{(0.5 - 0.52)}{0.0158} \approx -1.27\)

Finally, using Table 8.1, we find the desired probability is about 10%. Here is a visual representation of what this solution space looks like:

normal distribution plot showing proportion and z score

Figure 8.2. Finding the possible sample proportions of voters that did not vote for Obama using the normal distribution.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility