10.2 - Test a Single Proportion

Example Section

The baseline prevalence of smoking in a particular community is 30%. A clean indoor air policy goes into effect. What is the sample size required to detect a decrease in smoking prevalence of at least 2 percentage points, with an alpha of 0.05 and a power of 90%?

Formula Section

We are interested in testing the following hypothesis:

\(\begin{array}{l}
\mathrm{H}_{0}\colon \pi=\pi_{0} \\
\mathrm{H}_{1}\colon \pi=\pi_{1}=\pi_{0}+d
\end{array}\)

Where \(\pi\) is the true proportion, \(\pi_0\) is some specified value for the proportion we wish to test (30% in our example), and \(\pi_1\) (which differs from \(\pi_0\) by an amount d (d= 2% in our example)) is the alternative value.  
The formula needed to calculate the sample size is:

\(\displaystyle{n=\frac{1}{d^{2}}\left[z_{\alpha} \sqrt{\pi_{0}\left(1-\pi_{0}\right)}+z_{\beta} \sqrt{\pi_{1}\left(1-\pi_{1}\right)}\right]^{2}}\)

Where

  • \(\pi_0\) = null hypothesized proportion
  • d = estimated change in proportion

Note that we can replace \(z_a\) by \(z_{\alpha / 2}\) for a two-sided test.
The z terms can be found from a standard normal distribution table, and common values are shown below:

Table 8.1 Values of \(z_a\) or \(z_{a/2}\) for common values of the significance level
and of \(z_{\beta}\) (in bold) for common values of power.
Significance level  
One-sided Two-sided Power

5%
1.6449

1%
2.3263
0.1%
3.0902
5%
1.9600
1%
2.5758
0.1%
3.2905
90%
1.2816
95%
1.6449

(Chapter 8.5, p 305, Woodward book)

The table below can also be used to estimate the necessary sample size:

These tables give requirements for a one-sided test directly. For two-sided tests, use the table corresponding to half the required significance level. Note that \(\pi_{0}\) is the hypothesized proportion (under \(H_{0}\)) and \(d\) is the difference to be tested.

(a) 5% significance, 90% power

\(\pi_{0}\)

\(d\) 0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95
0.01 1 178 8 001 13 923 18 130 20 625 21 406 20 475 17 830 13 473 7 400 3 717
0.02 366 2 070 3 534 4 567 5 172 5 349 5 097 4 417 3 308 1 769 833
0.03 192 950 1 593 2 045 2 305 2 376 2 255 1 944 1 443 748 322
0.04 123 551 908 1 158 1 300 1 335 1 262 1 083 795 398 148
0.05 88 362 589 746 834 853 804 686 498 239  
0.06 67 258 414 521 580 591 555 471 338 155  
0.07 54 194 308 385 427 434 405 342 242 104  
0.08 44 152 238 296 327 331 308 258 181 71  
0.09 38 123 190 235 259 261 242 201 139 48  
0.10 32 102 156 191 210 211 195 161 109    
0.15 18 49 72 87 93 92 83 66 40    
0.20 12 30 42 49 52 50 44 33      
0.25 9 20 27 31 33 31 26 18      
0.30 7 14 19 22 22 20 16        
0.35 5 11 14 16 16 14 10        
0.40 4 9 11 12 11 10          
0.45 4 7 8 9 8 6          
0.50 3 6 7 7 6            

(Tables from Woodward, M. Epidemiology Study Design and Analysis. Boca Raton: Chapman and Hall:, 2013)

  Stop and Think!


Looking at the table values, what happens to the necessary sample size as:

  1. Prevalence increases (\(B_0\))? Does the sample size increase or decrease?
  2. What happens to the sample size as effect size decreases?
  3. What is the minimal detectable difference if you had funds for 1,500 subjects?
  1. The largest sample sizes occur with baseline prevalence at 0.5
  2. The smaller the effect size, the larger the sample size
  3. About 3.6% decrease in prevalence