4.5 - Inference for the Population Proportion

4.5 - Inference for the Population Proportion

Earlier in the lesson, we talked about two types of estimation, point, and interval. Let's now apply them to estimate a population proportion from sample data.

Point Estimate for the Population Proportion

The point estimate of the population proportion, $$p$$, is:

$$\hat{p}=$$ # of successes in the sample of size n

Confidence Interval for the Population Proportion

Recall that:

If $$np$$ and $$n(1-p)$$ are greater than five, then $$\hat{p}$$ is approximately normal with mean, $$p$$, standard error $$\sqrt{\frac{p(1-p)}{n}}$$.

Under these conditions, the sampling distribution of the sample proportion, $$\hat{p}$$, is approximately Normal. The multiplier used in the confidence interval will come from the Standard Normal distribution.

4.5.1 - Construct and Interpret the CI

4.5.1 - Construct and Interpret the CI

To construct a confidence interval we're going to use the following 3 steps:

1. Step 1: Check Condition

Check all conditions before using the sampling distribution of the sample proportion.

We previously used $$np$$ and $$n(1-p)$$. But $$p$$ is not known. Therefore, for the confidence interval, we will use:

• $$n\hat{p}>5$$ and
• $$n(1-\hat{p})>5$$
2. For a confidence interval for a proportion, there is a technique called exact methods. These methods can be used if the software offers it. These exact methods are more complicated and are based on the relationship between the binomial and another distribution we will later learn called the F-distribution. The Z-method is much simpler and fairly easy to compute. In fact, if you ever come across a published random survey (e.g. a Gallup poll) you can use the methods in this lesson to construct a reliable proportion confidence interval rather quickly.

What can one do if the conditions are NOT satisfied?

3. Step 2: Construct the General Form

The general form of the confidence interval is '$$\text{point estimate }\pm M\times \hat{SE}(\text{estimate})$$.' The point estimate is the sample proportion, $$\hat{p}$$, and the estimated standard error is $$\hat{SE}(\hat{p})=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$. If the conditions are satisfied, then the sampling distribution is approximately normal. Therefore, the multiplier comes from the normal distribution. This interval is also known as the one-sample z-interval for $$p$$, or the Normal Approximation confidence interval for $$p$$.

$$\boldsymbol{\left(1-\alpha \right) 100\%}$$ confidence interval for the population proportion, $$\boldsymbol{p}$$

$$\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}$$

where $$z_{\alpha/2}$$ represents a z-value with $$\alpha/2$$ area to the right of it.

General notes about the confidence interval...
• The $$\pm$$ in the formula above means "plus or minus". It is a shorthand way of writing
• $$(\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}})$$
• It is centered at the point estimate, $$\hat{p}$$.
• The width of the interval is determined by the margin of error.
• You must determine the multiplier.
4. Step 3: Interpret the Confidence Interval

Applying the template from earlier in the lesson we can say we are $$(1-\alpha)100\%$$ confident that the population proportion is between $$\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$ and $$\hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$. The examples will go into more detail regarding the interpretation of the confidence interval.

What terms in the margin of error would change the width of the confidence interval? Do the changes make it narrower or wider?

Construct a CI using Minitab

To construct a 1-proportion confidence interval...

1. In Minitab choose Stat > Basic Statistics > 1 proportion .
2. From the drop down box select the Summarized data option button. (If you have the raw data you would use the default drop down of One or more samples, each in a column.)
3. Enter the number of successes in the Number of Events text box, and the sample size in the Number of Trials text box.
4. Choose the Options button. The default confidence level is 95. If your desire another confidence level edit appropriately.
5. To use the z- interval method choose Normal Approximation from the Method text box. The exact interval is always appropriate and is the default. Under the conditions that: $n \hat{p} \ge 5, n(1− \hat{p}) \ge 5$, one can also use the z-interval to approximate the answers. The exact interval and the z-interval should be very similar when the conditions are satisfied.
6. Choose OK and OK again.

4.5.2 - Derivation of the Confidence Interval

4.5.2 - Derivation of the Confidence Interval

To calculate the confidence interval, we need to know how to find the z-multiplier. So where does this $$z_{\alpha}$$ come from?

The confidence interval can be derived from the following fact:

\begin{align} P\left(\left|\frac{\hat{p}-p}{\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}}\right|\le z_{\alpha/2}\right)=1-\alpha \\ P\left(-z_{\alpha/2}\le \dfrac{\hat{p}-p}{\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}}\le z_{\alpha/2}\right)=1-\alpha \\ P\left(\hat{p}-z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\le p \le \hat{p}+z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\right)=1-\alpha  \end{align}

The figure shows the general confidence interval on the normal curve.

How to find the multiplier using the Standard Normal Distribution

$$z_a$$ is the z-value having a tail area of $$a$$ to its right. With some calculation, one can use the Standard Normal Cumulative Probability Table to find the value.

Commonly Used Alpha Levels

The table is a list of frequently used alphas andtheir  $$z_{\alpha/2}$$ multipliers.

Confidence level and corresponding multiplier
Confidence Level $$\boldsymbol{\alpha}$$ $$\boldsymbol{z_{\alpha/2}}$$ $$\boldsymbol{z_{\alpha/2}}$$ Multiplier
90% .10 $$z_{0.05}$$ 1.645
95% .05 $$z_{0.025}$$ 1.960
98% .02 $$z_{0.01}$$ 2.326
99% .01 $$z_{0.005}$$ 2.576

The value of the multiplier increases as the confidence level increases. This leads to wider intervals for higher confidence levels. We are more confident of catching the population value when we use a wider interval.

4.5.3 - Interpreting the CI

4.5.3 - Interpreting the CI

In the graph below, we show 10 replications (for each replication, we sample 30 students and ask them whether they are Democrats) and compute an 80% Confidence Interval each time. We are lucky in this set of 10 replications and get exactly 8 out of 10 intervals that contain the parameter. Due to the small number of replications (only 10), it is quite possible that we get 9 out of 10 or 7 out of 10 that contain the true parameter. On the other hand, if we try it 10,000 (a large number of) times, the percentage that contains the true proportions will be very close to 80%.

If we repeatedly draw random samples of size n from the population where the proportion of success in the population is $$p$$ and calculate the confidence interval each time, we would expect that $$100(1 - \alpha)\%$$ of the intervals would contain the true parameter, $$p$$.

4.5.4 - Sample Size Computation

4.5.4 - Sample Size Computation

Sample Size Computation for the Population Proportion Confidence Interval

An important part of obtaining desired results is to get a large enough sample size. We can use what we know about the margin of error and the desired level of confidence to determine an appropriate sample size.

Recall that the margin of error, E, is half of the width of the confidence interval. Therefore for a one-sample proportion,

$$E=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}$$

Precision
The wider the interval, the poorer the precision. Note that the higher the confidence level, the wider the width (or equivalently, half-width) of the interval and thus the poorer the precision.

Since the confidence level reflects the success rate of the method we use to get the confidence interval, we like to have a narrower interval while keeping the confidence level at a reasonably higher level.

For most newspapers and magazine polls, it is understood that the margin of error is calculated for a 95% confidence interval (if not stated otherwise). A 3% margin of error is a popular choice also. For instance, you might see a television poll state that the "approval rating of the president is 72%; the margin of error of the poll is plus or minus 3%."

If we want the margin of error smaller (i.e., narrower intervals), we can increase the sample size. Or, if you calculate a 90% confidence interval instead of a 95% confidence interval, the margin of error will also be smaller. However, when one reports it, remember to state that the confidence interval is only 90% because otherwise, people will assume 95% confidence.

If the desired margin of error E is specified and the desired confidence level is specified, the required sample size to meet the requirements can be calculated by two methods:

Educated Guess

$$n=\dfrac{z^2_{\alpha/2}\hat{p}_g(1-\hat{p}_g)}{E^2}$$

Where $$\hat{p}_g$$ is an educated guess for the parameter $$p$$.

*The educated guess method is used if it is relatively inexpensive to sample more elements when needed.

Conservative Method

$$n=\dfrac{z^2_{\alpha/2}(\frac{1}{2})^2}{E^2}$$

This formula can be obtained from part (a) using the fact that:

For $$0 \le p \le 1, p (1 - p)$$ achieves its largest value at $$p=\frac{1}{2}$$.

*The conservative method is used if the start-up cost of sampling is expensive and thus it is not economical to sample more elements later.

The sample size obtained from using the educated guess is usually smaller than the one obtained using the conservative method. This smaller sample size means there is some risk that the resulting confidence interval may be wider than desired. Using the sample size by the conservative method has no such risk.