##
Estimating Proportions
Section* *

Estimating population proportions can be seen as a particular case of estimating the population mean. Many things that belong to the problems associated with the mean problem can be borrowed and used when working with proportions...

We want to estimate the proportion of units in the population having some attribute. For example a question might be, "What would be the proportion of Penn State students who are smokers?" Another example is, "What would be the proportion of people preferring a type of presentation?"

The Gallop Poll: Most are based on telephone interviews with a significant portion based on interviews conducted in person from home visits. Usually the sample size is at least 1000, sometimes even 1500.

Here are a number of interesting web sites associated with estimating proportions:

- Gallup Poll (http://www.gallup.com)
- President Bush's Final Approval Rating (from http://www.cbsnews.com)

Let's see in what ways the proportion problem is related to the mean problem...

**Example**: Do you approve of President Bush's job performance?

#### Answer

\( \quad y_i =

\begin{cases}

0 & \text{no} \\

1 & \text{yes}

\end{cases} \)

The population unit is: 1, 2, ... , *N*.

The variable of interest: \(y_1,y_2,...,y_N\).

Population proportion: \(p=\dfrac{1}{N} \sum\limits_{i=1}^N y_i\), which is the population mean, \(\mu\).

If we take a simple random sample of size *n*, then

\(\hat{p}= \sum\limits_{i=1}^n \dfrac{y_i}{n}=\bar{y}\)

This specific definition of *y _{i}* makes it having a variance that is related to its mean.

To find the finite population variance for \(y_1,y_2,...,y_N\), we know that the population mean is:

\(\mu=\dfrac{1}{N} \sum\limits_{i=1}^N y_i =p\)

By definition the variance is then:

\begin{align}

\sigma^2 & = \dfrac{\sum\limits_{i=1}^{N}(y_i-p)^2}{N-1} \\

& = \dfrac{\sum\limits_{i=1}^{N}(y_i^2-2py_i+p^2)}{N-1} \\

& = \dfrac{\sum\limits_{i=1}^{N}y_i^2-2p\sum\limits_{i=1}^N y_i+Np^2}{N-1} \\

\end{align}

Then, since \(y^2_i\) = \(y_i\) :

\begin{array}{lcl}

& = & \dfrac{\sum\limits_{i=1}^{N}y_i-2p\sum\limits_{i=1}^N y_i+Np^2}{N-1} \\

& = & \dfrac{Np-2p(Np)+Np^2}{N-1} \\

\sigma^2 & = & \dfrac{Np-Np^2}{N-1}=\dfrac{Np(1-p)}{N-1}

\end{array}

Theoretically this is the variance.

How will we estimate this? We can estimate this by:

\(\hat{\sigma}^2=s^2=\dfrac{n}{n-1}\hat{p}\cdot (1-\hat{p})\)

What we want is to see how \(\hat{p}\) behaves, therefore, we want to know its distribution. First, we find its mean, then its variance.

Since \(\hat{p}\) is \(\bar{y}\), we can get \(E(\hat{p})=\mu=p\). Then, we proceed to find its variance.

\begin{align}

Var(\hat{p}) & = \left(1-\dfrac{n}{N}\right)\cdot \dfrac{\sigma^2}{n} \\

& = \left(\dfrac{N-n}{N}\right)\cdot \dfrac{N \cdot p \cdot (1-p)}{(N-1)\cdot n} \\

& = \left(\dfrac{N-n}{N-1}\right)\cdot \dfrac{p \cdot (1-p)}{n} \\

\end{align}

How will we estimate the variance of \(\hat{p}\)? There are many answers for how to do this. One method would be to use maximum likelihood, another would be to find the unbiased estimator.

An unbiased estimator of the variance is:

\(\hat{V}ar(\hat{p})=\left(\dfrac{N-n}{N}\right) \cdot \dfrac{\hat{p} \cdot (1-\hat{p})}{n-1}\)

This is one reasonable answer for determining an estimate of the variance. The answer will not be very different from what one would get using other methods.

What about for confidence intervals? For this we need to know the distribution of \(\hat{p}\). When the sample size is large we know that \(\hat{p}\) has a normal distribution by the Central Limit Theorem. Therefore, we can use the *t* interval:

\(\text{Answer:}\quad \hat{p} \pm t_{\alpha/2} \sqrt{\hat{V}ar(\hat{p})}\)

How large is large enough?

\(\text{Answer: if } n \cdot \hat{p}\geq 5,\quad n \cdot (1-\hat{p})\geq 5.\)

We have fairly precise criteria here for whether or not to use *t* when constructing the confidence interval.

##
Example 2-3: Presidential Approval Rating
Section* *

Let's revisit the previous example about President Bush's final approval rating.

From CBS New (Jan 21, 2009) from the web site: President Bush's Final Approval Rating.

President Bush's final approval rating is 22%!

If you read the web site you can learn a lot about the specifics on this poll. The poll was conducted by telephone interview to 1,112 adults nationwide.

After looking at this statistic, provide a 95% CI for the true proportion. The 22% is a sample proportion - what is the true population proportion?

#### Answer

\(\hat{V}ar(\hat{p})=\left(\dfrac{N-n}{N}\right) \cdot \dfrac{\hat{p}\cdot (1-\hat{p})}{n-1}=1\cdot \dfrac{0.22 \times 0.78}{1112-1}=0.0001545\)

And a 95% confidence interval for *p* is:

\(0.22 \pm 1.96 \sqrt{0.0001545}\)

\(=0.22 \pm 0.0244\)