5.3.3 - Sample Size Computation

Sample Size Computation for the Population Proportion Confidence Interval

An important part of obtaining desired results is to get a large enough sample size. We can use what we know about the margin of error and the desired level of confidence to determine an appropriate sample size.

Recall that the margin of error, E, is half of the width of the confidence interval. Therefore for a one sample proportion,

\(E=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

Precision: The wider the interval, the poorer the precision. Note that the higher the confidence level, the wider the width (or equivalently, half width) of the interval and thus the poorer the precision.

Since the confidence level reflects the success rate of the method we use to get the confidence interval, we like to have a narrower interval while keeping the confidence level at a reasonably higher level.

For most newspapers and magazine polls, it is understood that the margin of error is calculated for a 95% confidence interval (if not stated otherwise). A 3% margin of error is a popular choice also. For instance, you might see a television poll state that the "approval rating of the president is 72%; the margin of error of the poll is plus or minus 3%."

If we want the margin of error smaller (i.e., narrower intervals), we can increase the sample size. Or, if you calculate a 90% confidence interval instead of a 95% confidence interval, the margin of error will also be smaller. However, when one reports it, remember to state that the confidence interval is only 90% because otherwise, people will assume 95% confidence.

Determining the Required Sample Size Section

If the desired margin of error E is specified and the desired confidence level is specified, the required sample size to meet the requirements can be calculated by two methods:

Educated Guess

\(n=\dfrac{z^2_{\alpha/2}\hat{p}_g(1-\hat{p}_g)}{E^2}\)

Where \(\hat{p}_g\) is an educated guess for the parameter \(p\).

*The educated guess method is used if it is relatively inexpensive to sample more elements when needed.

Conservative Method

\(n=\dfrac{z^2_{\alpha/2}(\frac{1}{2})^2}{E^2}\)

This formula can be obtained from part (a) using the fact that:

For \(0 \le p \le 1, p (1 - p)\) achieves its largest value at \(p=\frac{1}{2}\).

*The conservative method is used if the start-up cost of sampling is expensive and thus it is not economical to sample more elements later.

The sample size obtained from using the educated guess is usually smaller than the one obtained using the conservative method. This smaller sample size means there is some risk that the resulting confidence interval may be wider than desired. Using the sample size by the conservative method has no such risk.

Example 5-4 Section

Suppose a television poll states that the "approval rating of the president is 72%." For the next poll of the president's approval rating, we want to get a margin of error of 1% with 95% confidence. How many individuals should we sample?

Answer

Educated Guess:

\(z_{0.025} = 1.96, E = 0.01\)

Therefore,

\(n=\dfrac{(1.96)^2(0.72)(1-0.72)}{(0.01)^2}=7744.67\)

The sample size needed is 7745 people . We always need to round up to the next integer when the result is not a whole number. We discuss this in detail below.

Conservative Method:

\(z_{0.025} = 1.96, E = 0.01\)

Therefore,

\(n=\dfrac{(1.96)^2(0.5)(1-0.5)}{(0.01)^2}=9604\)

The sample size is 9604 people .

Cautions About Sample Size Calculations Section

Why do we need to round up?
Because we are estimating the smallest sample size needed to produce the desired error. Since we cannot sample a portion of a subject (e.g. we cannot take 0.66 of a subject) we need to round up to guarantee a large enough sample.
Remember that this is the minimum sample size needed for our study.
If we encounter a situation where the response rate is not 100% then if we just sample the calculated size, in the end, we will end up with a less than desired sample size. To counter this, we can adjust the calculated sample size by dividing by an anticipated response rate. For instance, using the above example if we expected about 40% of the those contacted to actually participate in our survey (i.e. a 40% response rate) then we would need to sample 7745/0.4=19,362.5 or 19,363. In other words, our actual sample size would need to be 19,363 given the 40% response rate.