2.3 - Sample Size Needed for Estimating Proportion

Using the formula to find sample size for estimating the mean we have:

\(n=\dfrac{1}{\dfrac{ d^2}{ z^2_{\alpha/2}\cdot \sigma^2}+\dfrac{1}{N}}\)

Now, \(\sigma^2=\dfrac{N}{N-1}\cdot p \cdot (1-p)\)substitutes in and we get:

\(n=\dfrac{N \cdot p \cdot (1-p)}{(N-1)\dfrac{d^2}{z^2_{\alpha/2}}+p\cdot(1-p)}\)

When the finite population correction can be ignored, the formula is:

\(n\approx \dfrac{z^2_{\alpha/2}\cdot p \cdot (1-p)}{d^2}\)

Now, for finding sample sizes for proportion, in addition to using an educated guess to estimate p, we can also find a conservative sample size which can guarantee the margin of error is short enough at a specified \(\alpha\).

  1. Educated guess (estimate p by \(\hat{p}\) ):

    \(n=\dfrac{N\cdot\hat{p}\cdot(1-\hat{p})}{(N-1)\dfrac{d^2}{z^2_{\alpha/2}}+\hat{p}\cdot(1-\hat{p})}\)

    Note, \(\hat{p}\) may be different from the true proportion. The sample size may not be large enough for some cases, (i.e., the margin of error not as small as specified).

  2. Conservative sample size:

    Since p(1 - p) attains maximum at p = 1/2, a conservative estimate for sample size is:

    \(n=\dfrac{N\cdot 1/4}{(N-1)\dfrac{d^2}{z^2_{\alpha/2}}+1/4}\)

Example 2-4: Presidential Approval Rating - Sample size Section

To estimate the next president's final approval rating, how many people should be sampled so that the margin of error is 3%, (a popular choice), with 95% confidence?

  1. Use educated guess: Bush's = 0.22

    Since N is very large compared to n, finite population correction is not needed.

    \begin{align}
    n &=\dfrac{\hat{p}\cdot(1-\hat{p})\cdot z^2_{\alpha/2}}{d^2}\\
     &=\dfrac{0.22\cdot0.78\cdot1.96^2}{0.03^2}\\
     &=732.47\\
    \end{align}

    round up to 733

  2. Use conservative approach.

    \begin{align}
    n &=\dfrac{0.5\cdot0.5\cdot1.96^2}{0.03^2}\\
     &=1067.11\\
    \end{align}

    round up to 1068.

Try it! Section

How do we choose between the educated guess or the conservative approach?

One should look at the cost of sampling extra units versus the set-up cost of the sampling process once more. If the set-up cost (maybe needed if an educated guess is used) of the sampling procedure once more is high compared to the cost of sampling extra units, then one will prefer to use a conservative approach.

  1. Find the proportion of CD players in this shipment that have lifetime longer than 2000 hours. The proportion from last shipment was 0.9. It is not costly to set up the testing procedure again if needed whereas sampling cost of each unit is expensive. We want to estimate the proportion to within 0.01 with 95% confidence.  Would you use the educated guess or the conservative approach?

    We should use an educated guess because it is not costly to set up the testing procedure again. On the other hand, the cost of the sampling of extra units is high due to the nature of the test.

  2. Get a ship out to the Bering Sea to sample the proportion of fish that have mercury level within a specified level. Last year the proportion is 0.9. Want to estimate the proportion to within 0.01 with 95% confidence.  Would you use the educated guess or the conservative approach?

    We should use a conservative approach because it is too expensive to send a ship out again if needed.

Exact intervals for population proportions Section

Since \(Y_i\) are defined as 1 or 0 depending on whether the unit has the attribute or not and the sampling is without replacement, one can see that, to be exact, \(\sum Y_i\)has a hypergeometric distribution.

Using this property, one can obtain exact confidence interval for p. When the total number of successes and total number of failures are large (larger than 5), we can use the t-interval. (can use z-interval if n > 50).

Sample size for estimating several proportions simultaneously Section

It is good to know that there is a solution in the following scenario: 

There are a few (maybe unknown) classes and one wants to collect enough samples so that the proportion in each class can be estimated to within a certain prescribed precision. (Details not needed, if interested, read   Ch. 5.4 and the reference cited there.)