# 6.3 - Estimating a Proportion for a Small, Finite Population

6.3 - Estimating a Proportion for a Small, Finite Population

The methods of the last page, in which we derived a formula for the sample size necessary for estimating a population proportion $$p$$ work just fine when the population in question is very large. When we have smaller, finite populations, however, such as the students in a high school or the residents of a small town, the formula we derived previously requires a slight modification. Let's start, as usual, by taking a look at an example.

## Example 6-3 A researcher is studying the population of a small town in India of $$N=2000$$ people. She's interested in estimating $$p$$ for several yes/no questions on a survey.

How many people $$n$$ does she have to randomly sample (without replacement) to ensure that her estimates $$\hat{p}$$ are within $$\epsilon=0.04$$ of the true proportions $$p$$?

We can't even begin to address the answer to this question until we derive a confidence interval for a proportion for a small, finite population!

Theorem

An approximate ($$(1-\alpha)100\%$$ confidence interval for a proportion $$p$$ of a small population is:

$$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}$$

### Proof

We'll use the example above, where possible, to make the proof more concrete. Suppose we take a random sample, $$X_1, X_2, \ldots, X_n$$, without replacement, of size $$n$$ from a population of size $$N$$. In the case of the example, $$N=2000$$. Suppose also, unknown to us, that for a particular survey question there are $$N_1$$ respondents who would respond "yes" to the question, and therefore $$N-N_1$$ respondents who would respond "no." That is, our small finite population looks like this:

If that's the case, the true proportion (but unknown to us) of yes respondents is:

$$p=P(Yes)=\dfrac{N_1}{N}$$

while the true proportion (but unknown to us) of no respondents is:

$$1-p=P(No)=1-\dfrac{N_1}{N}=\dfrac{N-N_1}{N}$$

Now, let $$X$$ denote the number of respondents in the sample who say yes, so that:

$$X=\sum\limits_{i=1}^n X_i$$

if $$X_i=1$$ if respondent $$i$$ answers yes, and $$X_i=0$$ if respondent $$i$$ answers no. Then, the proportion in the sample who say yes is:

$$\hat{p}=\dfrac{\sum\limits_{i=1}^n X_i}{n}$$

Then, $$X=\sum\limits_{i=1}^n X_i$$ is a hypergeometric random variable with mean:

$$E(X)=n\dfrac{N_1}{N}=np$$

and variance: $$Var(X)=n{N_1\over N}\left(1-{N_1\over N}\right) \left({N-n\over N-1}\right)=np(1-p)\left({N-n\over N-1}\right)$$

It follows that $$\hat{p}=X/n$$ has mean $$E(\hat{p})=p$$ and variance:

$$Var(\hat{p})=\dfrac{p(1-p)}{n}\left(\dfrac{N-n}{N-1}\right)$$

Then, the Central Limit Theorem tells us that:

$$\dfrac{\hat{p}-p}{\sqrt{\dfrac{p(1-p)}{n} \left(\dfrac{N-n}{N-1}\right) }}$$

follows an approximate standard normal distribution. Now, it's just a matter of doing the typical confidence interval derivation, in which we start with a probability statement, manipulate the quantity inside the parentheses, and substitute sample estimates where necessary. We've done that a number of times now, so skipping all of the details here, we get that an approximate $$(1-\alpha)100\%$$ confidence interval for $$p$$ of a small population is:

$$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}$$

By the way, it is worthwhile noting that if the sample $$n$$ is much smaller than the population size $$N$$, that is, if $$n<<N$$, then:

$$\dfrac{N-n}{N-1}\approx 1$$

and the confidence interval for $$p$$ of a small population becomes quite similar to the confidence interval for $$p$$ of a large population:

$$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}$$

## Example 6-3 (continued) A researcher is studying the population of a small town in India of $$N=2000$$ people. She's interested in estimating $$p$$ for several yes/no questions on a survey.

How many people $$n$$ does she have to randomly sample (without replacement) to ensure that her estimates $$\hat{p}$$ are within $$\epsilon=0.04$$ of the true proportion $$p$$?

Now that we know the correct formula for the confidence interval for $$p$$ of a small population, we can follow the same procedure we did for determining the sample size for estimating a proportion $$p$$ of a large population. The researcher's goal is to estimate $$p$$ so that the error is no larger than 0.04. That is, the goal is to calculate a 95% confidence interval such that:

$$\hat{p}\pm \epsilon=\hat{p}\pm 0.04$$

Now, we know the formula for an approximate $$(1-\alpha)100\%$$ confidence interval for a proportion $$p$$ of a small population is:

$$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}$$

So, again, we should proceed by equating the terms appearing after each of the above $$\pm$$ signs, and solving for $$n$$. That is, equate:

$$\epsilon=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}\cdot \dfrac{N-n}{N-1}}$$

and solve for $$n$$. Doing the algebra yields:

$$n=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})/\epsilon^2}{\dfrac{N-1}{N}+\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{N\epsilon^2}}$$

That looks simply dreadful! Let's make it look a little more friendly to the eyes:

$$n=\dfrac{m}{1+\dfrac{m-1}{N}}$$

where $$m$$ is defined as the sample size necessary for estimating the proportion $$p$$ for a large population, that is, when a correction for the population being small and finite is not made. That is:

$$m=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}$$

Now, before we make the calculation for our particular example, let's take a step back and summarize what we have just learned.

Estimating a population proportion $$p$$ of a small finite population

The sample size necessary for estimating a population proportion $$p$$ of a small finite population with $$(1-\alpha)100\%$$ confidence and error no larger than $$\epsilon$$ is:

$$n=\dfrac{m}{1+\dfrac{m-1}{N}}$$

where:

$$m=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}$$

is the sample size necessary for estimating the proportion $$p$$ for a large population.

## Example 6-3 (continued) A researcher is studying the population of a small town in India of $$N=2000$$ people. She's interested in estimating $$p$$ for several yes/no questions on a survey.

How many people $$n$$ does she have to randomly sample (without replacement) to ensure that her estimates $$\hat{p}$$ are within $$\epsilon=0.04$$ of the true proportion $$p$$?

Okay, once and for all, let's calculate this very patient researcher's sample size! Because the researcher has many different questions on the survey, it would behoove her to use a sample proportion of 0.50 in her calculations. If the maximum error $$\epsilon$$ is 0.04, the sample proportion is 0.5, and the researcher doesn't make the finite population correction, then she needs:

$$m=\dfrac{(1.96^2)(\frac{1}{4})}{0.04^2}=600.25$$

or 601 people to estimate $$p$$ with 95% confidence. But, upon making the correction for the small, finite population, we see that the researcher really only needs:

$$n=\dfrac{m}{1+\dfrac{m-1}{N}}=\dfrac{601}{1+\dfrac{601-1}{2000}}=462.3$$

or 463 people to estimate $$p$$ with 95% confidence.

## Effect of Population Size $$N$$

The following table illustrates how the sample size $$n$$ that is necessary for estimating a population proportion $$p$$ (with 95% confidence) is affected by the size of the population $$N$$. If $$\hat{p}=0.5$$, then the sample size $$n$$ is:

$$\hat{p} = 0.5$$ $$\large \epsilon$$= 0.01 $$\large \epsilon$$= 0.03 $$\large \epsilon$$= 0.05
N very large 9604 1068 385
N = 10, 000, 000 9595 1068 385
N = 1, 000, 000 9513 1067 385
N = 100, 000 8763 1057 384
N = 10, 000 4900 966 371
N = 1, 000 906 517 279

This table suggests, perhaps not surprisingly, that as the size of the population $$N$$ decreases, so does the necessary size $$n$$ of the sample.

  Link ↥ Has Tooltip/Popover Toggleable Visibility