6.3 - Estimating a Proportion for a Small, Finite Population

The methods of the last page, in which we derived a formula for the sample size necessary for estimating a population proportion $p$ work just fine when the population in question is very large. When we have smaller, finite populations, however, such as the students in a high school or the residents of a small town, the formula we derived previously requires a slight modification. Let's start, as usual, by taking a look at an example.

Example 6-3

A researcher is studying the population of a small town in India of $N=2000$ people. She's interested in estimating $p$ for several yes/no questions on a survey.

How many people $n$ does she have to randomly sample (without replacement) to ensure that her estimates $\hat{p}$ are within $\epsilon=0.04$ of the true proportions $p$?

Answer

We can't even begin to address the answer to this question until we derive a confidence interval for a proportion for a small, finite population!

Theorem

An approximate ($(1-\alpha)100\%$ confidence interval for a proportion $p$ of a small population is:

$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}$

Proof

We'll use the example above, where possible, to make the proof more concrete. Suppose we take a random sample, $X_1, X_2, \ldots, X_n$, without replacement, of size $n$ from a population of size $N$. In the case of the example, $N=2000$. Suppose also, unknown to us, that for a particular survey question there are $N_1$ respondents who would respond "yes" to the question, and therefore $N-N_1$ respondents who would respond "no." That is, our small finite population looks like this:

If that's the case, the true proportion (but unknown to us) of yes respondents is:

$p=P(Yes)=\dfrac{N_1}{N}$

while the true proportion (but unknown to us) of no respondents is:

$1-p=P(No)=1-\dfrac{N_1}{N}=\dfrac{N-N_1}{N}$

Now, let $X$ denote the number of respondents in the sample who say yes, so that:

$X=\sum\limits_{i=1}^n X_i$

if $X_i=1$ if respondent $i$ answers yes, and $X_i=0$ if respondent $i$ answers no. Then, the proportion in the sample who say yes is:

$\hat{p}=\dfrac{\sum\limits_{i=1}^n X_i}{n}$

Then, $X=\sum\limits_{i=1}^n X_i$ is a hypergeometric random variable with mean:

$E(X)=n\dfrac{N_1}{N}=np$

and variance: $$Var(X)=n{N_1\over N}\left(1-{N_1\over N}\right) \left({N-n\over N-1}\right)=np(1-p)\left({N-n\over N-1}\right)$$

It follows that $\hat{p}=X/n$ has mean $E(\hat{p})=p$ and variance:

$Var(\hat{p})=\dfrac{p(1-p)}{n}\left(\dfrac{N-n}{N-1}\right)$

Then, the Central Limit Theorem tells us that:

$\dfrac{\hat{p}-p}{\sqrt{\dfrac{p(1-p)}{n} \left(\dfrac{N-n}{N-1}\right) }}$

follows an approximate standard normal distribution. Now, it's just a matter of doing the typical confidence interval derivation, in which we start with a probability statement, manipulate the quantity inside the parentheses, and substitute sample estimates where necessary. We've done that a number of times now, so skipping all of the details here, we get that an approximate $(1-\alpha)100\%$ confidence interval for $p$ of a small population is:

$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}$

By the way, it is worthwhile noting that if the sample $n$ is much smaller than the population size $N$, that is, if $n<<N$, then:

$\dfrac{N-n}{N-1}\approx 1$

and the confidence interval for $p$ of a small population becomes quite similar to the confidence interval for $p$ of a large population:

$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}$

Example 6-3 (continued)

A researcher is studying the population of a small town in India of $N=2000$ people. She's interested in estimating $p$ for several yes/no questions on a survey.

How many people $n$ does she have to randomly sample (without replacement) to ensure that her estimates $\hat{p}$ are within $\epsilon=0.04$ of the true proportion $p$?

Answer

Now that we know the correct formula for the confidence interval for $p$ of a small population, we can follow the same procedure we did for determining the sample size for estimating a proportion $p$ of a large population. The researcher's goal is to estimate $p$ so that the error is no larger than 0.04. That is, the goal is to calculate a 95% confidence interval such that:

$\hat{p}\pm \epsilon=\hat{p}\pm 0.04$

Now, we know the formula for an approximate $(1-\alpha)100\%$ confidence interval for a proportion $p$ of a small population is:

$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}$

So, again, we should proceed by equating the terms appearing after each of the above $\pm$ signs, and solving for $n$. That is, equate:

$\epsilon=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}\cdot \dfrac{N-n}{N-1}}$

and solve for $n$. Doing the algebra yields:

$n=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})/\epsilon^2}{\dfrac{N-1}{N}+\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{N\epsilon^2}}$

That looks simply dreadful! Let's make it look a little more friendly to the eyes:

$n=\dfrac{m}{1+\dfrac{m-1}{N}}$

where $m$ is defined as the sample size necessary for estimating the proportion $p$ for a large population, that is, when a correction for the population being small and finite is not made. That is:

$m=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}$

Now, before we make the calculation for our particular example, let's take a step back and summarize what we have just learned.

Estimating a population proportion $p$ of a small finite population

The sample size necessary for estimating a population proportion $p$ of a small finite population with $(1-\alpha)100\%$ confidence and error no larger than $\epsilon$ is:

$n=\dfrac{m}{1+\dfrac{m-1}{N}}$

where:

$m=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}$

is the sample size necessary for estimating the proportion $p$ for a large population.

Example 6-3 (continued)

A researcher is studying the population of a small town in India of $N=2000$ people. She's interested in estimating $p$ for several yes/no questions on a survey.

How many people $n$ does she have to randomly sample (without replacement) to ensure that her estimates $\hat{p}$ are within $\epsilon=0.04$ of the true proportion $p$?

Answer

Okay, once and for all, let's calculate this very patient researcher's sample size! Because the researcher has many different questions on the survey, it would behoove her to use a sample proportion of 0.50 in her calculations. If the maximum error $\epsilon$ is 0.04, the sample proportion is 0.5, and the researcher doesn't make the finite population correction, then she needs:

$m=\dfrac{(1.96^2)(\frac{1}{4})}{0.04^2}=600.25$

or 601 people to estimate $p$ with 95% confidence. But, upon making the correction for the small, finite population, we see that the researcher really only needs:

$n=\dfrac{m}{1+\dfrac{m-1}{N}}=\dfrac{601}{1+\dfrac{601-1}{2000}}=462.3$

or 463 people to estimate $p$ with 95% confidence.

Effect of Population Size $N$

The following table illustrates how the sample size $n$ that is necessary for estimating a population proportion $p$ (with 95% confidence) is affected by the size of the population $N$. If $\hat{p}=0.5$, then the sample size $n$ is:

$ \hat{p} = 0.5$	$ \large \epsilon $= 0.01	$ \large \epsilon $= 0.03	$ \large \epsilon $= 0.05
N very large	9604	1068	385
N = 10, 000, 000	9595	1068	385
N = 1, 000, 000	9513	1067	385
N = 100, 000	8763	1057	384
N = 10, 000	4900	966	371
N = 1, 000	906	517	279

This table suggests, perhaps not surprisingly, that as the size of the population $N$ decreases, so does the necessary size $n$ of the sample.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility

6.3 - Estimating a Proportion for a Small, Finite Population

Example 6-3

Answer

Proof

Example 6-3 (continued)

Answer

Example 6-3 (continued)

Answer

Effect of Population Size \(N\)