4.5  Inference for the Population Proportion
4.5  Inference for the Population ProportionEarlier in the lesson, we talked about two types of estimation, point, and interval. Let's now apply them to estimate a population proportion from sample data.
 Point Estimate for the Population Proportion

The point estimate of the population proportion, \(p\), is:
\(\hat{p}=\) # of successes in the sample of size n
Confidence Interval for the Population Proportion
Recall that:
If \(np\) and \(n(1p)\) are greater than five, then \(\hat{p}\) is approximately normal with mean, \(p\), standard error \(\sqrt{\frac{p(1p)}{n}}\).
Under these conditions, the sampling distribution of the sample proportion, \(\hat{p}\), is approximately Normal. The multiplier used in the confidence interval will come from the Standard Normal distribution.
4.5.1  Construct and Interpret the CI
4.5.1  Construct and Interpret the CITo construct a confidence interval we're going to use the following 3 steps:
 Step 1: Check Condition
Check all conditions before using the sampling distribution of the sample proportion.
We previously used \(np\) and \(n(1p)\). But \(p\) is not known. Therefore, for the confidence interval, we will use:
 \(n\hat{p}>5\) and
 \(n(1\hat{p})>5\)

For a confidence interval for a proportion, there is a technique called exact methods. These methods can be used if the software offers it. These exact methods are more complicated and are based on the relationship between the binomial and another distribution we will later learn called the Fdistribution. The Zmethod is much simpler and fairly easy to compute. In fact, if you ever come across a published random survey (e.g. a Gallup poll) you can use the methods in this lesson to construct a reliable proportion confidence interval rather quickly.
What can one do if the conditions are NOT satisfied?
 Step 2: Construct the General Form
The general form of the confidence interval is '\(\text{point estimate }\pm M\times \hat{SE}(\text{estimate})\).' The point estimate is the sample proportion, \(\hat{p}\), and the estimated standard error is \(\hat{SE}(\hat{p})=\sqrt{\frac{\hat{p}(1\hat{p})}{n}}\). If the conditions are satisfied, then the sampling distribution is approximately normal. Therefore, the multiplier comes from the normal distribution. This interval is also known as the onesample zinterval for \(p\), or the Normal Approximation confidence interval for \(p\).
\(\boldsymbol{\left(1\alpha \right) 100\%}\) confidence interval for the population proportion, \(\boldsymbol{p}\)
\(\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1\hat{p})}{n}}\)
where \(z_{\alpha/2}\) represents a zvalue with \(\alpha/2\) area to the right of it.
General notes about the confidence interval... The \(\pm\) in the formula above means "plus or minus". It is a shorthand way of writing
 \((\hat{p}z_{\alpha/2}\sqrt{\frac{\hat{p}(1\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1\hat{p})}{n}})\)
 It is centered at the point estimate, \(\hat{p}\).
 The width of the interval is determined by the margin of error.
 You must determine the multiplier.
 Step 3: Interpret the Confidence Interval
Applying the template from earlier in the lesson we can say we are \((1\alpha)100\%\) confident that the population proportion is between \(\hat{p}z_{\alpha/2}\sqrt{\frac{\hat{p}(1\hat{p})}{n}}\) and \(\hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1\hat{p})}{n}}\). The examples will go into more detail regarding the interpretation of the confidence interval.
Minitab^{®}
Construct a CI using Minitab
To construct a 1proportion confidence interval...
 In Minitab choose Stat > Basic Statistics > 1 proportion .
 From the drop down box select the Summarized data option button. (If you have the raw data you would use the default drop down of One or more samples, each in a column.)
 Enter the number of successes in the Number of Events text box, and the sample size in the Number of Trials text box.
 Choose the Options button. The default confidence level is 95. If your desire another confidence level edit appropriately.
 To use the z interval method choose Normal Approximation from the Method text box. The exact interval is always appropriate and is the default. Under the conditions that: $n \hat{p} \ge 5, n(1− \hat{p}) \ge 5$, one can also use the zinterval to approximate the answers. The exact interval and the zinterval should be very similar when the conditions are satisfied.
 Choose OK and OK again.
4.5.2  Derivation of the Confidence Interval
4.5.2  Derivation of the Confidence IntervalTo calculate the confidence interval, we need to know how to find the zmultiplier. So where does this \(z_{\alpha}\) come from?
The confidence interval can be derived from the following fact:
\begin{align} P\left(\left\frac{\hat{p}p}{\sqrt{\dfrac{\hat{p}(1\hat{p})}{n}}}\right\le z_{\alpha/2}\right)=1\alpha \\ P\left(z_{\alpha/2}\le \dfrac{\hat{p}p}{\sqrt{\dfrac{\hat{p}(1\hat{p})}{n}}}\le z_{\alpha/2}\right)=1\alpha \\ P\left(\hat{p}z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1\hat{p})}{n}}\le p \le \hat{p}+z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1\hat{p})}{n}}\right)=1\alpha \end{align}
The figure shows the general confidence interval on the normal curve.
How to find the multiplier using the Standard Normal Distribution
\(z_a\) is the zvalue having a tail area of \(a\) to its right. With some calculation, one can use the Standard Normal Cumulative Probability Table to find the value.
Commonly Used Alpha Levels
The table is a list of frequently used alphas andtheir \(z_{\alpha/2}\) multipliers.
Confidence level and corresponding multiplier  

Confidence Level  \(\boldsymbol{\alpha}\)  \(\boldsymbol{z_{\alpha/2}}\)  \(\boldsymbol{z_{\alpha/2}}\) Multiplier 
90%  .10  \(z_{0.05}\)  1.645 
95%  .05  \(z_{0.025}\)  1.960 
98%  .02  \(z_{0.01}\)  2.326 
99%  .01  \(z_{0.005}\)  2.576 
The value of the multiplier increases as the confidence level increases. This leads to wider intervals for higher confidence levels. We are more confident of catching the population value when we use a wider interval.
4.5.3  Interpreting the CI
4.5.3  Interpreting the CIIn the graph below, we show 10 replications (for each replication, we sample 30 students and ask them whether they are Democrats) and compute an 80% Confidence Interval each time. We are lucky in this set of 10 replications and get exactly 8 out of 10 intervals that contain the parameter. Due to the small number of replications (only 10), it is quite possible that we get 9 out of 10 or 7 out of 10 that contain the true parameter. On the other hand, if we try it 10,000 (a large number of) times, the percentage that contains the true proportions will be very close to 80%.
If we repeatedly draw random samples of size n from the population where the proportion of success in the population is \(p\) and calculate the confidence interval each time, we would expect that \(100(1  \alpha)\%\) of the intervals would contain the true parameter, \(p\).
4.5.4  Sample Size Computation
4.5.4  Sample Size ComputationSample Size Computation for the Population Proportion Confidence Interval
An important part of obtaining desired results is to get a large enough sample size. We can use what we know about the margin of error and the desired level of confidence to determine an appropriate sample size.
Recall that the margin of error, E, is half of the width of the confidence interval. Therefore for a onesample proportion,
\(E=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1\hat{p})}{n}}\)
 Precision
 The wider the interval, the poorer the precision. Note that the higher the confidence level, the wider the width (or equivalently, halfwidth) of the interval and thus the poorer the precision.
Since the confidence level reflects the success rate of the method we use to get the confidence interval, we like to have a narrower interval while keeping the confidence level at a reasonably higher level.
For most newspapers and magazine polls, it is understood that the margin of error is calculated for a 95% confidence interval (if not stated otherwise). A 3% margin of error is a popular choice also. For instance, you might see a television poll state that the "approval rating of the president is 72%; the margin of error of the poll is plus or minus 3%."
If we want the margin of error smaller (i.e., narrower intervals), we can increase the sample size. Or, if you calculate a 90% confidence interval instead of a 95% confidence interval, the margin of error will also be smaller. However, when one reports it, remember to state that the confidence interval is only 90% because otherwise, people will assume 95% confidence.
If the desired margin of error E is specified and the desired confidence level is specified, the required sample size to meet the requirements can be calculated by two methods:
 Educated Guess

\(n=\dfrac{z^2_{\alpha/2}\hat{p}_g(1\hat{p}_g)}{E^2}\)
Where \(\hat{p}_g\) is an educated guess for the parameter \(p\).
*The educated guess method is used if it is relatively inexpensive to sample more elements when needed.
 Conservative Method

\(n=\dfrac{z^2_{\alpha/2}(\frac{1}{2})^2}{E^2}\)
This formula can be obtained from part (a) using the fact that:
For \(0 \le p \le 1, p (1  p)\) achieves its largest value at \(p=\frac{1}{2}\).
*The conservative method is used if the startup cost of sampling is expensive and thus it is not economical to sample more elements later.
The sample size obtained from using the educated guess is usually smaller than the one obtained using the conservative method. This smaller sample size means there is some risk that the resulting confidence interval may be wider than desired. Using the sample size by the conservative method has no such risk.
Cautions About Sample Size Calculations
 Why do we need to round up?
Because we are estimating the smallest sample size needed to produce the desired error. Since we cannot sample a portion of a subject (e.g. we cannot take 0.66 of a subject) we need to round up to guarantee a large enough sample.
 Remember that this is the minimum sample size needed for our study.
If we encounter a situation where the response rate is not 100% then if we just sample the calculated size, in the end, we will end up with a less than desired sample size. To counter this, we can adjust the calculated sample size by dividing by an anticipated response rate. For instance, using the above example if we expected about 40% of those contacted to actually participate in our survey (i.e. a 40% response rate) then we would need to sample 7745/0.4=19,362.5 or 19,363. In other words, our actual sample size would need to be 19,363 given the 40% response rate.