7.2.1 - Confidence Intervals

In this section, we begin by defining the point estimate and developing the confidence interval based on what we have learned so far.

Point Estimate: The point estimate for the difference between the two population proportions, $p_1-p_2$, is the difference between the two sample proportions written as $\hat{p}_1-\hat{p}_2$.

We know that a point estimate is probably not a good estimator of the actual population. By adding some amount of error to this point estimate, we can create a confidence interval as we did with one sample parameters.

Derivation of the Confidence Interval

Consider two populations and label them as population 1 and population 2. Take a random sample of size $n_1$ from population 1 and take a random sample of size $n_2$ from population 2. If we consider them separately,

Proportion from Sample 1:

If $n_1p_1\ge 5$ and $n_1(1-p_1)\ge 5$, then $\hat{p}_1$ will follow a normal distribution with...

\begin{array}{rcc} \text{Mean:}&&p_1 \\ \text{ Standard Error:}&& \sqrt{\dfrac{p_1(1-p_1)}{n_1}} \\ \text{Estimated Standard Error:}&& \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}} \end{array}

Proportion from Sample 2:

If $n_2p_2\ge 5$ and $n_2(1-p_2)\ge 5$, then $\hat{p}_2$ will follow a normal distribution with...

\begin{array}{rcc} \text{Mean:}&&p_2 \\ \text{ Standard Error:}&& \sqrt{\dfrac{p_2(1-p_2)}{n_2}} \\ \text{Estimated Standard Error:}&& \sqrt{\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \end{array}

Sample Proportion 1 - Sample Proportion 2:

Using the theory introduced previously, if $n_1p_1$, $n_1(1-p_1)$, $n_2p_2$, and $n_2(1-p_2)$ are all greater than five and we have independent samples, then the sampling distribution of $\hat{p}_1-\hat{p}_2$ is approximately normal with...

\begin{array}{rcc} \text{Mean:}&&p_1-p_2 \\ \text{ Standard Error:}&& \sqrt{\dfrac{p_1(1-p_1)}{n_1}+\dfrac{p_2(1-p_2)}{n_2}} \\ \text{Estimated Standard Error:}&& \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \end{array}

Putting these pieces together, we can construct the confidence interval for $p_1-p_2$. Since we do not know $p_1$ and $p_2$, we need to check the conditions using $n_1\hat{p}_1$, $n_1(1-\hat{p}_1)$, $n_2\hat{p}_2$, and $n_2(1-\hat{p}_2)$. If these conditions are satisfied, then the confidence interval can be constructed for two independent proportions.

Confidence interval for two independent proportions

The $(1-\alpha)100\%$ confidence interval of $p_1-p_2$ is given by:

$\hat{p}_1-\hat{p}_2\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$

Example 7-1: Received $100 by Mistake Section

Males and females were asked about what they would do if they received a $100 bill by mail, addressed to their neighbor, but wrongly delivered to them. Would they return it to their neighbor? Of the 69 males sampled, 52 said "yes" and of the 131 females sampled, 120 said "yes."

Find a 95% confidence interval for the difference in proportions for males and females who said "yes."

Answer

Let’s let sample one be males and sample two be females. Then we have:

Males:: $n_1=69$, $\hat{p}_1=\dfrac{52}{69}$
Females:: $n_2=131$, $\hat{p}_2=\dfrac{120}{131}$

Checking conditions we see that $n_1\hat{p}_1$, $n_1(1-\hat{p}_1)$, $n_2\hat{p}_2$, and $n_2(1-\hat{p}_2)$ are all greater than five so our conditions are satisfied.

Using the formula above, we get:

\begin{array}{rcl} \hat{p}_1-\hat{p}_2 &\pm &z_{\alpha/2}\sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\\ \dfrac{52}{69}-\dfrac{120}{131}&\pm &1.96\sqrt{\dfrac{\frac{52}{69}\left(1-\frac{52}{69}\right)}{69}+\dfrac{\frac{120}{131}(1-\frac{120}{131})}{131}}\\ -0.1624 &\pm &1.96 \left(0.05725\right)\\ -0.1624 &\pm &0.1122\ or \ (-0.2746, -0.0502)\\ \end{array}

We are 95% confident that the difference of population proportions of males who said "yes" and females who said "yes" is between -0.2746 and -0.0502.

Based on both ends of the interval being negative, it seems like the proportion of females who would return it is higher than the proportion of males who would return it.

We will discuss how to find the confidence interval using Minitab after we examine the hypothesis test for two proportion. Minitab calculates the test and the confidence interval at the same time.

Caution! What happens if we defined $\hat{p}_1$ to be the proportion of females and $\hat{p}_2$ for the proportion of males? If you follow through the calculations, you will find that the confidence interval will differ only in sign. In other words, if female was $\hat{p}_1$, the interval would be 0.0502 to 0.2746. It still shows that the proportion of females is higher than the proportion of males.