##
Estimating Proportions
Section* *

Estimating population proportions can be seen as a particular case of estimating the population mean. Many things that belong to the problems associated with the mean problem can be borrowed and used when working with proportions...

We want to estimate the proportion of units in the population having some attribute. For example, a question might be, "What would be the proportion of Penn State students who are smokers?" Another example is, "What would be the proportion of people preferring a type of presentation?"

The Gallop Poll: Most are based on telephone interviews with a significant portion based on interviews conducted in person from home visits. Usually, the sample size is at least 1000, sometimes even 1500.

Here are a number of interesting websites associated with estimating proportions:

- Gallup Poll
- President Bush's Final Approval Rating (From CBS News)

Let's see in what ways the proportion problem is related to the mean problem...

**Example**: Do you approve of President Bush's job performance?

### Answer

\( \quad y_i =

\begin{cases}

0 & \text{no} \\

1 & \text{yes}

\end{cases} \)

The population unit: 1, 2, ..., *N*.

The variable of interest: \(y_1,y_2,...,y_N\).

Population proportion: \(p=\dfrac{1}{N} \sum\limits_{i=1}^N y_i\), which is the population mean, \(\mu\).

If we take a simple random sample of size *n*, then

\(\hat{p}= \sum\limits_{i=1}^n \dfrac{y_i}{n}=\bar{y}\)

This specific definition of *y _{i}* makes it have a variance that is related to its mean.

To find the finite population variance for \(y_1,y_2,...,y_N\), we know that the population mean is:

\(\mu=\dfrac{1}{N} \sum\limits_{i=1}^N y_i =p\)

By definition the variance is then:

\begin{align}

\sigma^2 & = \dfrac{\sum\limits_{i=1}^{N}(y_i-p)^2}{N-1} \\

& = \dfrac{\sum\limits_{i=1}^{N}(y_i^2-2py_i+p^2)}{N-1} \\

& = \dfrac{\sum\limits_{i=1}^{N}y_i^2-2p\sum\limits_{i=1}^N y_i+Np^2}{N-1} \\

\end{align}

Then, since \(y^2_i\) = \(y_i\) :

\begin{array}{lcl}

& = & \dfrac{\sum\limits_{i=1}^{N}y_i-2p\sum\limits_{i=1}^N y_i+Np^2}{N-1} \\

& = & \dfrac{Np-2p(Np)+Np^2}{N-1} \\

\sigma^2 & = & \dfrac{Np-Np^2}{N-1}=\dfrac{Np(1-p)}{N-1}

\end{array}

Theoretically, this is the variance.

How will we estimate this? We can estimate this by:

\(\hat{\sigma}^2=s^2=\dfrac{n}{n-1}\hat{p}\cdot (1-\hat{p})\)

What we want is to see how \(\hat{p}\) behaves, therefore, we want to know its distribution. First, we find its mean, then its variance.

Since \(\hat{p}\) is \(\bar{y}\), we can get \(E(\hat{p})=\mu=p\). Then, we proceed to find its variance.

\begin{align}

Var(\hat{p}) & = \left(1-\dfrac{n}{N}\right)\cdot \dfrac{\sigma^2}{n} \\

& = \left(\dfrac{N-n}{N}\right)\cdot \dfrac{N \cdot p \cdot (1-p)}{(N-1)\cdot n} \\

& = \left(\dfrac{N-n}{N-1}\right)\cdot \dfrac{p \cdot (1-p)}{n} \\

\end{align}

How will we estimate the variance of \(\hat{p}\)? There are many answers for how to do this. One method would be to use maximum likelihood, another would be to find the unbiased estimator.

An unbiased estimator of the variance is:

\(\hat{V}ar(\hat{p})=\left(\dfrac{N-n}{N}\right) \cdot \dfrac{\hat{p} \cdot (1-\hat{p})}{n-1}\)

This is one reasonable answer for determining an estimate of the variance. The answer will not be very different from what one would get using other methods.

What about confidence intervals? For this, we need to know the distribution of \(\hat{p}\). When the sample size is large we know that \(\hat{p}\) has a normal distribution by the Central Limit Theorem. Therefore, we can use the *t* interval:

\(\text{Answer:}\quad \hat{p} \pm t_{\alpha/2} \sqrt{\hat{V}ar(\hat{p})}\)

How large is large enough?

\(\text{Answer: if } n \cdot \hat{p}\geq 5,\quad n \cdot (1-\hat{p})\geq 5.\)

We have fairly precise criteria here for whether or not to use *t* when constructing the confidence interval.

##
Example 2-3: Presidential Approval Rating
Section* *

Let's revisit the previous example about President Bush's final approval rating.

From CBS News (Jan 21, 2009) from the web site: President Bush's Final Approval Rating.

President Bush's final approval rating is 22%!

If you read the website you can learn a lot about the specifics of this poll. The poll was conducted by telephone interview with 1,112 adults nationwide.

After looking at this statistic, provide a 95% CI for the true proportion. The 22% is a sample proportion - what is the true population proportion?

### Answer

\(\hat{V}ar(\hat{p})=\left(\dfrac{N-n}{N}\right) \cdot \dfrac{\hat{p}\cdot (1-\hat{p})}{n-1}=1\cdot \dfrac{0.22 \times 0.78}{1112-1}=0.0001545\)

And a 95% confidence interval for *p* is:

\(0.22 \pm 1.96 \sqrt{0.0001545}\)

\(=0.22 \pm 0.0244\)