Example 6.1 Section
A researcher wants to estimate \(\mu\), the mean systolic blood pressure of adult Americans, with 95% confidence and error \(\epsilon\) no larger than 3 mm Hg. How many adult Americans, \(n\), should the researcher randomly sample to achieve her estimation goal?
Answer
The researcher's goal is to estimate \(\mu\) so that the error is no larger than 3 mm Hg. (By the way, \(\epsilon\) is typically called the maximum error of the estimate.) That is, her goal is to calculate a 95% confidence interval such that:
\(\bar{x}\pm \epsilon=\bar{x}\pm 3\)
Now, we know the formula for a \((1\alpha)100\%\) confidence interval for a population mean \(\mu\) is:
\(\bar{x}\pm t_{\alpha/2,n1}\left(\dfrac{s}{\sqrt{n}}\right)\)
So, it seems that a reasonable way to proceed would be to equate the terms appearing after each of the above \(\pm\) signs, and solve for \(n\). That is, equate:
\(\epsilon=t_{\alpha/2,n1}\left(\dfrac{s}{\sqrt{n}}\right)\)
and solve for \(n\). Multiplying through by the square root of \(n\), we get:
\(\epsilon \sqrt{n}=t_{\alpha/2,n1}(s)\)
And, dividing through by \(\epsilon\) and squaring both sides, we get:
\(n=\dfrac{(t_{\alpha/2,n1})^2 s^2}{\epsilon^2}\)
Now, what's wrong with the formula we derived? Well... the \(t\)value on the right side of the equation depends on \(n\).
That's not particularly helpful given that we are trying to find \(n\)! We can solve that problem by simply replacing the \(t\)value that depends on \(n\) with a \(Z\)value that doesn't. After all, you might recall that as \(n\) increases, the \(t\)distribution approaches the standard normal distribution. Doing so, we get:
\(n \approx \dfrac{(z^2_{\alpha/2})s^2}{\epsilon^2}\)
Before we make the calculation for our particular example, let's take a step back and summarize what we have just learned.
 Estimating a population mean \(\mu\)

The sample size necessary for estimating a population mean \(\mu\) with \((1\alpha)100\%\) confidence and error no larger than \(\epsilon\) is:
\(n = \dfrac{(z^2_{\alpha/2})s^2}{\epsilon^2}\)
Typically, the hardest part of determining the necessary sample size is finding \(s^2\), that is, a decent estimate of the population variance. There are a few ways of obtaining \(s^2\).
Ways to Determine \(s^2\) Section

You can often get \(s^2\), an estimate of the population variance from the scientific literature. After all, scientific research is typically not done in a vacuum. That is, what one researcher is studying and reporting in scientific journals is typically also studied and reported by several other researchers in various locations around the world. If you're in need of an estimate of the variance of the front leg length of redeyed tree frogs, you'll probably be able to find it in a research paper reported in some scientific journal.

You can often get \(s^2\), an estimate of the population variance by conducting a small pilot study on 510 people (or trees or snakes or... whatever you're measuring).

You can often get \(s^2\), an estimate of the population variance by using what we know about the Empirical Rule, which states that we can expect 95% of the observations to fall in the interval:
\(\bar{x}\pm 2s\)
Here's a picture that illustrates how this part of the Empirical Rule can help us determine a reasonable value of \(s\):
That is, we could define the range of values as that which captures 95% of the measurements. If we do that, then we can work backwards to see that s can be determined by dividing the range by 4. That is:
\(s=\dfrac{Range}{4}=\dfrac{MaxMin}{4}\)
When statisticians use the Empirical Rule to help a researcher arrive at a reasonable value of \(s\), they almost always use the above formula. That said, there may be occasion in which it is worthwhile using another part of the Empirical Rule, namely that we can expect 99.7% of the observations to fall in the interval:
\(\bar{x}\pm 3s\)
Here's a picture that illustrates how this part of the Empirical Rule can help us determine a reasonable value of \(s\):
In this case, we could define the range of values as that which captures 99.7% of the measurements. If we do that, then we can work backwards to see that \(s\) can be determined by dividing the range by 6. That is:
\(s=\dfrac{Range}{6}=\dfrac{MaxMin}{6}\)
Example 61 (Continued) Section
A researcher wants to estimate \(\mu\), the mean systolic blood pressure of adult Americans, with 95% confidence and error \(\epsilon\) no larger than 3 mm Hg. How many adult Americans, \(n\), should the researcher randomly sample to achieve her estimation goal?
Answer
If the maximum error \(\epsilon\) is 3, and the sample variance is \(s^2=10^2\), we need:
\(n=\dfrac{(1.96)^2(10)^2}{3^2}=42.7\)
or 43 people to estimate \(\mu\) with 95% confidence. In general, when making sample size calculations such at this one, it is a good idea to change all of the factors to see what the "cost" in sample size is for achieving certain errors \(\epsilon\) and confidence levels \((1\alpha)\). Doing that here, we get:
\(s^2 = 10^2\)  \( \epsilon \)= 1  \( \epsilon \)= 3  \( \epsilon \)= 5 

90% \((z_{0.05} = 1.645)\)  271  31  11 
95% \((z_{0.025} = 1.96)\)  385  43  16 
99% \((z_{0.005} = 2.576)\)  664  74  27 
We can also change the estimate of the variance. For example, if we change the sample variance to \(s^2=8^2\), then the necessary sample sizes for various errors \(\epsilon\) and confidence levels \((1\alpha)\) become:
\(s^2 = 8^2\)  \( \epsilon \)= 1  \( \epsilon \)= 3  \( \epsilon \)= 5 

90% \((z_{0.05} = 1.645)\)  174  20  7 
95% \((z_{0.025} = 1.96)\)  246  28  10 
99% \((z_{0.005} = 2.576)\)  425  48  17 
Factors Affecting the Sample Size Section
If we take a look back at the formula for the sample size:
\(n =\dfrac{(z^2_{\alpha/2})s^2}{\epsilon^2}\)
we can make some generalizations about how each of three factors, namely the standard deviation s, the confidence level \((1\alpha)100\%\), and the error \(\epsilon\), affect the necessary sample size.
As the confidence level \((1\alpha)100\%\) increases, the necessary sample size increases. That's because as the confidence level increases, the \(Z\)value, which appears in the numerator of the formula, increases. Again, you can see an example of this generalization from some of the numbers generated in that last example:

As the error \(\epsilon\) decreases, the necessary sample size \(n\) increases. That's because the error \(epsilon\) term appears in the denominator. You can see an example of this generalization from some of the numbers generated in that last example:
Hover over the icon to see further explanation
\(s^2 = 10^2\) \( \epsilon \)= 1 \( \epsilon \)= 3 \( \epsilon \)= 5 90% \((z_{0.05} = 1.645)\) 271 31 11 95% \((z_{0.025} = 1.96)\) 385 43 16 99% \((z_{0.005} = 2.576)\) 664 74 27 
As the confidence level \((1\alpha)100\%\) increases, the necessary sample size increases. That's because as the confidence level increases, the \(Z\)value, which appears in the numerator of the formula, increases. Again, you can see an example of this generalization from some of the numbers generated in that last example:
Hover over the icon to see further explanation
\( \epsilon \)= 1 \( \epsilon \)= 3 \( \epsilon \)= 5 90% \((z_{0.05} = 1.645)\) 174 20 7 95% \((z_{0.025} = 1.96)\) 246 28 10 99% \((z_{0.005} = 2.576)\) 425 48 17 
As the sample standard deviation \(s\) increases, the necessary sample size increases. That's because the standard deviation s appears in the numerator of the formula. Again, you can see an example of this generalization from some of the numbers generated in that last example:
\(s^2 = 10^2\) \( \epsilon \)= 1 \( \epsilon \)= 3 \( \epsilon \)= 5 90% \((z_{0.05} = 1.645)\) 271 31 11 95% \((z_{0.025} = 1.96)\) 385 43 16 99% \((z_{0.005} = 2.576)\) 664 74 27 \(s^2 = 8^2\) \( \epsilon \)= 1 \( \epsilon \)= 3 \( \epsilon \)= 5 90% \((z_{0.05} = 1.645)\) 174 20 7 95% \((z_{0.025} = 1.96)\) 246 28 10 99% \((z_{0.005} = 2.576)\) 425 48 17