9: Confidence Intervals
9: Confidence IntervalsLesson Overview
Let us revisit the Statistical Paradigm in Figure 9.1 below. Lesson 7 and Lesson 8 focused on probability which tells us what kinds of samples might arise from a known population. However, in most real-world problems, reasoning from the population to the sample is not the goal. Usually, we have data from a sample and want to know what can be inferred about the population. Lesson 9 concentrates on one statistical method for reasoning from the sample to the population (inference): Confidence Intervals.
Figure 9.1: Key Components of the Statistical Paradigm
We find that 55% of a random sample of adult Pennsylvanians support a plan to publicly fund the first two years of a college education. What can be said about the proportion of all Pennsylvania adults who support the plan?
We find that the average of 36 independent measurements of the mass of the asteroid Ceres come out to 9.46 times 10^{20} kilograms with a standard deviation of 10^{19} kilograms. What can be said about the true mass of Ceres?
We find that the size of human breast tumors implanted in mice shrink by an average of 0.6 cm^{3} with a standard deviation of 0.5 cm^{3} when treated with a proposed new cancer treatment although 30% of the mice had tumors that grew with the new treatment. What can be said about the percentage of all mice that would be helped by the new treatment and how well the treatment would shrink tumors on average in the population of mice?
This lesson describes how to use Confidence Intervals to examine these types of scientific questions.
- A population mean is the numerical average of a variable in the entire population of interest. One example would be the average amount spent on dairy products by adult Americans in the previous year. The actual numerical value of a population mean would rarely be known.
- A sample mean is the numerical average of the data for a variable in a sample. One example would be the average amount spent on dairy products in the previous year by the respondents to a sample survey. The value of the sample mean might be used to estimate an unknown population mean.
- The Standard Error of a sample Mean (often abbreviated S.E.M.) is the standard deviation of the sampling distribution of a sample mean. In a random sample, it is estimated by \(s/\sqrt{n}\) (the sample standard deviation divided by the square root of the sample size).
- A population mean difference is the difference between numerical averages of a variable for two different groups in the entire population of interest. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women of New York City. The actual numerical value of a population mean difference would rarely be known.
- A sample mean difference is the difference between numerical averages of a variable for two different groups in a sample. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women in a sample of 100 New Yorkers. The value of the sample mean difference might be used to estimate an unknown population mean difference.
Objectives
- Interpret confidence intervals for population values.
- Find confidence intervals for population proportions and means using random samples.
- Understand the key principles of estimation:
- Confidence intervals are random quantities, varying from sample to sample. Sometimes these random intervals cover the true population parameter and sometimes they don't. The coverage probability (the chance that the interval covers the parameter) is called the confidence level.
- There is a trade-off between confidence and reliability. In order to achieve a higher level of confidence, you must be willing to accept a larger margin of error (a wider interval) or pay the price of a larger sample size.
- The variability of a sample statistic decreases with the square root of the sample size. For example, when the sample size is four times as large, the margin of error will be cut in half.
- Formulas for making confidence intervals are based on the probabilities associated with the randomization used to collect the data.
- Apply appropriate decision rules to determine whether or not there is a statistically significant difference between two population values.
9.1 - Confidence Intervals for a Population Proportion
9.1 - Confidence Intervals for a Population ProportionA random sample is gathered to estimate the percentage of American adults who believe that parents should be required to vaccinate their children for diseases like measles, mumps, and rubella. We know that estimates arising from surveys like that are random quantities that vary from sample-to-sample. In Lesson 8 we learned what probability has to say about how close a sample proportion will be to the true population proportion.
In an unbiased random survey
sample proportion = population proportion + random error.
The Normal Approximation tells us that the distribution of these random errors over all possible samples follows the normal curve with a standard deviation of
\[\sqrt{\frac{\text{population proportion}(1-\text{population proportion})}{n}} =\sqrt{\frac{p(1−p)}{n}}\]
The random error is just how much the sample estimate differs from the true population value. The fact that random errors follow the normal curve also holds for many other summaries like sample averages or differences between two sample proportions or averages - you just need a different formula for the standard deviation in each case (see sections 9.3 and 9.4 below).
Notice how the formula for the standard deviation of the sample proportion depends on the true population proportion p. When we do probability calculations we know the value of p so we can just plug that in to get the standard deviation. But when the population value is unknown, we won't know the standard deviation exactly. However, we can get a very good approximation by plugging in the sample proportion. We call this estimate the standard error of the sample proportion
Standard Error of Sample Proportion = estimated standard deviation of the sample proportion =
\[\sqrt{\frac{\text{sample proportion}(1-\text{sample proportion})}{n}}\]
Example 9.1
The EPA considers indoor radon levels above 4 picocuries per liter (pCi/L) of air to be high enough to warrant amelioration efforts. Tests in a sample of 200 Centre County Pennsylvania homes found 127 (63.5%) of these sampled households to have indoor radon levels above 4 pCi/L. What is the population value being estimated by this sample percentage? What is the standard error of the corresponding sample proportion?
Recap: the estimated percent of Centre Country households that don't meet the EPA guidelines is 63.5% with a standard error of 3.4%. The Normal approximation tells us that
- for 68% of all possible samples, the sample proportion will be within one standard error of the true population proportion and
- for 95% of all possible samples, the sample proportion will be within two standard errors of the true population proportion.
Thus, a 68% confidence interval for the percent of all Centre Country households that don't meet the EPA guidelines is given by
63.5% ± 3.4%
A 95% confidence interval for the percent of all Centre Country households that don't meet the EPA guidelines is given by
63.5% ± 6.8%
Confidence Intervals for a proportion:
For large random samples a confidence interval for a population proportion is given by
\[\text{sample proportion} \pm z* \sqrt{\frac{\text{sample proportion}(1-\text{sample proportion})}{n}}\]
where z* is a multiplier number that comes form the normal curve and determines the level of confidence (see Table 9.1 for some common multiplier numbers).
Multiplier Number (z*) | Level of Confidence |
---|---|
3.0 | 99.7% |
2.58 (2.576) | 99% |
2.0 (more precisely 1.96) | 95% |
1.645 | 90% |
1.282 | 80% |
1.15 | 75% |
1.0 | 68% |
Interpreting Confidence Intervals
To interpret a confidence interval remember that the sample information is random - but there is a pattern to its behavior if we look at all possible samples. Each possible sample gives us a different sample proportion and a different interval. But, even though the results vary from sample-to-sample, we are "confident" because the margin-of-error would be satisfied for 95% of all samples (with z*=2).
The margin-of-error being satisfied means that the interval includes the true population value.
Properties of Confidence Intervals
- There is a trade-off between the level of confidence and the precision of the interval. If you want more confidence, you will have to settle for a wider interval (bigger z*).
- Our formula for the confidence interval depends on the normal approximation, so you must check that you have independent trials and a large enough sample to be sure that the normal approximation is appropriate.
- The standard error calculation involves estimating the true standard deviation by substituting the sample proportion for the population proportion in the formula. Luckily, this works well in situations where the normal curve is appropriate [i.e. when np and n(1-p) are both bigger than 5].
- A confidence Interval is only related to sampling variability. The probability that your interval captures the true population value could be much lower if your survey is biased (e.g. bad question wording, low response rate, etc...).
Example 9.2
We take a random sample of 50 households in order to estimate the percentage of all homes in the United States that have a refrigerator. It turns out that 49 of the 50 homes in our sample have a refrigerator. Can we use the formulas above to make a confidence interval in this situation?
9.2 - Confidence Intervals for a Population Mean
9.2 - Confidence Intervals for a Population MeanExample 9.3
Over the three-day period from April 1 to April 3, 2015, a national poll surveyed 1500 American households to gauge their levels of discretionary spending. The question asked was how much the respondent spent the day before; not counting the purchase of a home, motor vehicle, or normal household bills. For these sampled households, the average amount spent was \(\bar x\) = \$95 with a standard deviation of s = \$185.
How close will the sample average come to the population mean?
Let's follow the same reasoning as developed in section 9.2 for proportions. We have:
\[\text{Sample average} = \text{population mean} + \text{random error}\]
The Normal Approximation tells us that the distribution of these random errors over all possible samples follows the normal curve with a standard deviation of \(\frac{\sigma}{\sqrt{n}}\). Notice how the formula for the standard deviation of the average depends on the true population standard deviation \(\sigma\). When the population standard deviation is unknown, like in this example, we can still get a good approximation by plugging in the sample standard deviation (s). We call the resulting estimate the Standard Error of the Mean (SEM).
Standard Error of the Mean (SEM) = estimated standard deviation of the sample average =
\[\frac{\text{standard deviation of the sample}}{\sqrt{n}} = \frac{s}{\sqrt{n}}\]
In the example, we have s = \$185 so the Standard Error of the Mean =
\[\frac{\text{\$185}}{\sqrt{1500}} = \$4.78\]
Recap: the estimated daily amount of discretionary spending amongst American households at the beginning of April 2015 was \$95 with a standard error of \$4.78
The Normal Approximation tells us, for example, that for 95% of all large samples, the sample average will be within two SEM of the true population average.
Thus, a 95% confidence interval for the true daily discretionary spending would be \$95 ± 2(\$4.78) or\$95 ± \$9.56.
Of course, other levels of confidence are possible. When the sample size is large, s will be a good estimate of \(\sigma\) and you can use multiplier numbers from the normal curve. When the sample size is smaller (say n < 30), then s will be fairly different from \(\sigma\) for some samples - and that means that we need a bigger multiplier number to account for that. (see the optional material on "t-multipliers" in chapter 21).
Confidence Intervals for a population mean (n \(\ge\)) 30
For large random samples, a confidence interval for a population mean is given by
\[\text{sample mean} \pm z^* \frac{s}{\sqrt{n}}\]
where z* is a multiplier number that comes from the normal curve and determines the level of confidence (see Table 9.1 in section 9.2).
Example 9.4
The equatorial radius of the planet Jupiter is measured 40 times independently by a process that is practically free of bias. These measurements average \(\bar x\) = 71492 kilometers with a standard deviation of s = 28 kilometers. Find a 90% confidence interval for the equatorial radius of Jupiter.
Example 9.5
How much credit card debt do students typically have when they graduate from Penn State University? A sample of 15 recent Penn State graduates is obtained. Each of these recent graduates is asked to indicate the amount of credit card debt they had at the time of graduation. It turns out that the sample mean was \(\bar x\) = \$2430 with a sample standard deviation of s = \$2300. Would it be appropriate to use the method above to find a 99% confidence interval for the average credit card debt for all recent Penn State graduates?
9.3 - Confidence Intervals for the Difference Between Two Population Proportions or Means
9.3 - Confidence Intervals for the Difference Between Two Population Proportions or MeansWhen a sample survey produces a proportion or a mean as a response, we can use the methods in section 9.1 and section 9.2 to find a confidence interval for the true population values. In this section, we discuss confidence intervals for comparative studies. How do we assess the difference between two proportions or means when they come from a comparative observational study or experiment? To address this question, we first need a new rule.
Standard Error of a Difference
When two samples are independent of each other,
Standard Error for a Difference between two sample summaries =
\[\sqrt{(\text{standard error in first sample})^{2} + (\text{standard error in second sample})^{2}}\]
Example 9.6
A medical researcher conjectures that smoking can result in the wrinkled skin around the eyes. The researcher recruited 150 smokers and 250 nonsmokers to take part in an observational study and found that 95 of the smokers and 105 of the nonsmokers were seen to have prominent wrinkles around the eyes (based on a standardized wrinkle score administered by a person who did not know if the subject smoked or not). Some results from the study are found in Table 9.2.
Table 9.2. Results of the Smoking and wrinkles study (example 9.6)
Smokers | Nonsmokers | |
Sample Size | 150 | 250 |
Sample Proportion with Prominent Wrinkles | 95/150 = 0.63 | 105/250 = 0.42 |
Standard Error for Proportion | \(\sqrt{\frac{0.63(0.37)}{150}} = 0.0394\) | \(\sqrt{\frac{0.42(0.58)}{250}} = 0.0312\) |
How do the smokers compare to the non-smokers? The difference between the two sample proportions is 0.63 - 0.42 = 0.21. We would like to make a CI for the true difference that would exist between these two groups in the population. So we compute
\[\text{Standard Error for Difference} = \sqrt{0.0394^{2}+0.0312^{2}} ≈ 0.05\]
If we think about all possible ways to draw a sample of 150 smokers and 250 non-smokers then the differences we'd see between sample proportions would approximately follow the normal curve. Thus, a 95% Confidence Interval for the differences between these two proportions in the population is given by:
\[\text{Difference Between the Sample Proportions} \pm z^*(\text{Standard Error for Difference})\]
or
\[0.21 \pm 2(0.05)\;\; \text{or}\;\; 0.21 \pm 0.1\]
Notice that this 95% confidence interval goes from 0.11 to 0.31. Since the interval does not contain 0, we see that the difference seen in this study was "significant."
Another way to think about whether the smokers and non-smokers have significantly different proportions with wrinkles is to calculate a 95% Confidence Interval for each group separately. For the smokers, we have a confidence interval of 0.63 ± 2(0.0394) or 0.63 ± 0.0788. The interval for smokers goes from about 0.55 up to 0.71. For the non-smokers, we have a confidence interval of 0.42 ± 2(0.0312) or 0.42 ± 0.0624. The interval for non-smokers goes from about 0.36 up to 0.48. The interval for the smokers (which starts at 0.55) and the interval for the non-smokers (which ends at 0.48) do not overlap - that is another sign that the differences seen in this study were "significant."
Statistical Significance and Confidence Intervals
- If the two confidence intervals do not overlap, we can conclude that there is a statistically significant difference in the two population values at the given level of confidence; or alternatively
- If the confidence interval for the difference does not contain zero, we can conclude that there is a statistically significant difference in the two population values at the given level of confidence.
The first rule is the "more conservative" one since there are some circumstances when the interval for the difference does not contain zero but there is some overlap in the individual confidence intervals.
Importantly, the formula for the standard deviation of a difference is for two independent samples. It would not apply to dependent samples like those gathered in a matched pairs study.
Example 9.7
A general rule used clinically to judge normal levels of strength is that a person's dominant hand should have about 10% higher grip strength than their non-dominant hand. The idea is that the preferential use of your dominant hand in everyday activities might act as a form of endurance training for the muscles of the hand resulting in the strength differential. If this theory about the underlying reason for the strength differential is true then there should be less of a difference in young children than in adults. Data from a study of 60 right-handed boys under 10 years old and 60 right-handed men aged 30-39 are shown in Table 9.3.
Table 9.3 Grip Strength (kilograms) Average and Standard Deviation by Hand and Age
Boys < 10 years old (n=60) | Men 30-39 years old (n=60) | |
Right Hand | \(\bar x\) = 6.2 kg s = 2.1 kg | \(\bar x\) =40.3 kg s = 9.3 kg |
Left Hand | \(\bar x\) =5.9 kg s = 2.2 kg | \(\bar x\) =35.6 kg s = 8.8 kg |
Difference | \(\bar x\) =0.3 kg s = 0.8 kg | \(\bar x\) =4.7 kg s = 3.6 kg |
Is the grip strength in the right hand higher than the grip strength in the left hand for boys under 10 years old? We cannot compare the left-hand results and the right-hand results as if they were separate independent samples. This is a matched pairs situation since the results are highly correlated. Some boys will be stronger than others in both hands. Thus, the proper way to examine the disparity between right-hand strength and left-hand strength is to look at the differences between the two hands in each boy and then analyze the resulting data as a single sample (as discussed in section 9.3). Looking at these differences we see their average is 0.3 kg with a standard deviation of 0.8 kg.
Thus the SEM for these differences is \(\frac{0.8}{\sqrt{60}}=0.103\) and a 95% Confidence Interval for the average right-hand versus left hand strength differential in the population of boys is 0.3 kg ± 2(0.103) kg or 0.3 kg ± 0.206 kg. The interval goes from about 0.09 kg up to 0.51 kg.
Similarly for the men in the study the SEM for the right-left strength differential is\(\frac{3.6}{\sqrt{60}}=0.465\) and a 95% Confidence Interval for the average strength differential in the population of men is 4.7 kg ± 2(0.465) kg or 4.7 kg ± 0.93 kg. The interval goes from 3.77 kg up to 5.63 kg.
Finally, we want to examine the idea that the right-left strength differential will be different between the 30-39 year old men and the boys < 10. That comparison involves two independent samples of 60 people each. To find a confidence interval for the average difference between these two populations we compute
\[\text{Standard Error for Difference} = \sqrt{0.103^{2}+0.465^{2}} \approx 0.476\]
If we think about all possible ways to draw a sample of 60 boys under 10 and 600 men from 30-39, then the differences we'd see between sample means would approximately follow the normal curve. Thus a 95% Confidence Interval for the differences between these two means in the population is given by
\[\text{Difference Between the Sample Means} \pm z^*(\text{Standard Error for Difference})\]
or
\[4.7 - 0.3 \text{kg} \pm 2(0.476) \text{kg} \;\; \text{or}\;\; 4.4 \text{kg} \pm 0.95 \text{kg}\]
Notice that this 95% confidence interval goes from 3.45 kg up to 5.35 kg. Since the interval does not contain 0, we see that the difference between the adults and children seen in this study was "significant."
9.4 - Test Yourself!
9.4 - Test Yourself!Think About It!
Select the answer you think is correct - then click the right arrow to proceed to the next question.
9.5 - Have Fun With It!
9.5 - Have Fun With It!Have Fun With It!
J.B. Landers ©
True Value
Lyrics © by Alan Reifman.
May sing to the tune of "Moon Shadow" by Cat Stevens
Within your CI, you get the true value, true value, true value,
With 95%, you get the true value, true value, true value,
You get a sample statistic, a sample r, or sample M,
You then take plus-or-minus two (it’s really 1.96…), standard errors beyond your stat,
And within this new interval, we can be, so confident,
That the true value, mu or rho, will be somewhere… inside…, our confidence interval,
Within your CI, you get the true value, true value, true value,
With 95%, you get the true value, true value, true value...