Lesson 9: Confidence Intervals

Lesson 9: Confidence Intervals

Lesson Overview

Let us revisit the Statistical Paradigm in Figure 9.1 below. Lesson 7 and Lesson 8 focused on probability which tells us what kinds of samples might arise from a known population. However, in most real-world problems, reasoning from the population to the sample is not the goal. Usually, we have data from a sample and want to know what can be inferred about the population. Lesson 9 concentrates on one statistical method for reasoning from the sample to the population (inference): Confidence Intervals.

Statistical Paradigm

[Not supported by viewer]
Probability

The rules of probability can tell us the likelihood of different types of samples that might arise from a particular population.

[Not supported by viewer]
Inference

We want to infer what parameter values are most consistent with the sample statistic at hand.

[Not supported by viewer]
Conclude

What does our knowledge of the parameter values tell us about the population?

[Not supported by viewer]
Describe and Compare

Data is collected from the samples and, with sample data in hand, we attempt to create statistical summaries and pictures that give the salient features of the data collected.

[Not supported by viewer]

Samples

[Not supported by viewer]

Statistical Summaries and Pictures

[Not supported by viewer]

Population

[Not supported by viewer]

Parameters

[Not supported by viewer]

Figure 9.1: Key Components of the Statistical Paradigm

We find that 55% of a random sample of adult Pennsylvanians support a plan to publicly fund the first two years of a college education. What can be said about the proportion of all Pennsylvania adults who support the plan?

We find that the average of 36 independent measurements of the mass of the asteroid Ceres come out to 9.46 times 1020 kilograms with a standard deviation of 1019 kilograms. What can be said about the true mass of Ceres?

We find that the size of human breast tumors implanted in mice shrink by an average of 0.6 cm3 with a standard deviation of 0.5 cm3 when treated with a proposed new cancer treatment although 30% of the mice had tumors that grew with the new treatment. What can be said about the percentage of all mice that would be helped by the new treatment and how well the treatment would shrink tumors on average in the population of mice?

This lesson describes how to use Confidence Intervals to examine these types of scientific questions.

  • A population mean is the numerical average of a variable in the entire population of interest. One example would be the average amount spent on dairy products by adult Americans in the previous year. The actual numerical value of a population mean would rarely be known.
  • A sample mean is the numerical average of the data for a variable in a sample. One example would be the average amount spent on dairy products in the previous year by the respondents to a sample survey. The value of the sample mean might be used to estimate an unknown population mean.
  • The Standard Error of a sample Mean (often abbreviated S.E.M.) is the standard deviation of the sampling distribution of a sample mean. In a random sample, it is estimated by \(s/\sqrt{n}\) (the sample standard deviation divided by the square root of the sample size).
  • A population mean difference is the difference between numerical averages of a variable for two different groups in the entire population of interest. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women of New York City. The actual numerical value of a population mean difference would rarely be known.
  • A sample mean difference is the difference between numerical averages of a variable for two different groups in a sample. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women in a sample of 100 New Yorkers. The value of the sample mean difference might be used to estimate an unknown population mean difference.

Objectives

After successfully completing this lesson, you should be able to:

  • Interpret confidence intervals for population values.
  • Find confidence intervals for population proportions and means using random samples.
  • Understand the key principles of estimation:
    • Confidence intervals are random quantities, varying from sample to sample.  Sometimes these random intervals cover the true population parameter and sometimes they don't.  The coverage probability (the chance that the interval covers the parameter) is called the confidence level.
    • There is a trade-off between confidence and reliability.  In order to achieve a higher level of confidence, you must be willing to accept a larger margin of error (a wider interval) or pay the price of a larger sample size.
    • The variability of a sample statistic decreases with the square root of the sample size.  For example, when the sample size is four times as large, the margin of error will be cut in half.
    • Formulas for making confidence intervals are based on the probabilities associated with the randomization used to collect the data.
  • Apply appropriate decision rules to determine whether or not there is a statistically significant difference between two population values.

9.1 - Confidence Intervals for a Population Proportion

9.1 - Confidence Intervals for a Population Proportion

A random sample is gathered to estimate the percentage of American adults who believe that parents should be required to vaccinate their children for diseases like measles, mumps, and rubella. We know that estimates arising from surveys like that are random quantities that vary from sample-to-sample. In Lesson 8 we learned what probability has to say about how close a sample proportion will be to the true population proportion.

In an unbiased random survey

sample proportion = population proportion + random error.

The Normal Approximation tells us that the distribution of these random errors over all possible samples follows the normal curve with a standard deviation of

\[\sqrt{\frac{\text{population proportion}(1-\text{population proportion})}{n}} =\sqrt{\frac{p(1−p)}{n}}\]

The random error is just how much the sample estimate differs from the true population value. The fact that random errors follow the normal curve also holds for many other summaries like sample averages or differences between two sample proportions or averages - you just need a different formula for the standard deviation in each case (see sections 9.3 and 9.4 below).

Notice how the formula for the standard deviation of the sample proportion depends on the true population proportion p. When we do probability calculations we know the value of p so we can just plug that in to get the standard deviation. But when the population value is unknown, we won't know the standard deviation exactly. However, we can get a very good approximation by plugging in the sample proportion. We call this estimate the standard error of the sample proportion

Standard Error of Sample Proportion = estimated standard deviation of the sample proportion =

\[\sqrt{\frac{\text{sample proportion}(1-\text{sample proportion})}{n}}\]

Example 9.1

radon test kit

The EPA considers indoor radon levels above 4 picocuries per liter (pCi/L) of air to be high enough to warrant amelioration efforts. Tests in a sample of 200 Centre County Pennsylvania homes found 127 (63.5%) of these sampled households to have indoor radon levels above 4 pCi/L. What is the population value being estimated by this sample percentage? What is the standard error of the corresponding sample proportion?

Solution: The population value is the percentage of all Centre County homes with indoor radon levels above 4 pCi/L. The standard error of the sample proportion = \[\sqrt{\frac{0.635(1-0.635)}{200}} = 0.034\]

Recap: the estimated percent of Centre Country households that don't meet the EPA guidelines is 63.5% with a standard error of 3.4%. The Normal approximation tells us that

  • for 68% of all possible samples, the sample proportion will be within one standard error of the true population proportion and
  • for 95% of all possible samples, the sample proportion will be within two standard errors of the true population proportion.

Thus, a 68% confidence interval for the percent of all Centre Country households that don't meet the EPA guidelines is given by

63.5% ± 3.4%

A 95% confidence interval for the percent of all Centre Country households that don't meet the EPA guidelines is given by

63.5% ± 6.8%

Note! When you see a margin of error in a news report, it almost always referring to a 95% confidence interval. But other levels of confidence are possible

Confidence Intervals for a proportion:

For large random samples a confidence interval for a population proportion is given by

\[\text{sample proportion} \pm z* \sqrt{\frac{\text{sample proportion}(1-\text{sample proportion})}{n}}\]

where z* is a multiplier number that comes form the normal curve and determines the level of confidence (see Table 9.1 for some common multiplier numbers).

Table 9.1. Commonly Used Multipliers

Multiplier Number (z*) Level of Confidence
3.0 99.7%
2.58 (2.576) 99%
2.0 (more precisely 1.96) 95%
1.645 90%
1.282 80%
1.15 75%
1.0 68%

 

Interpreting Confidence Intervals

To interpret a confidence interval remember that the sample information is random - but there is a pattern to its behavior if we look at all possible samples. Each possible sample gives us a different sample proportion and a different interval. But, even though the results vary from sample-to-sample, we are "confident" because the margin-of-error would be satisfied for 95% of all samples (with z*=2).

The margin-of-error being satisfied means that the interval includes the true population value.

Properties of Confidence Intervals

  • There is a trade-off between the level of confidence and the precision of the interval. If you want more confidence, you will have to settle for a wider interval (bigger z*).
  • Our formula for the confidence interval depends on the normal approximation, so you must check that you have independent trials and a large enough sample to be sure that the normal approximation is appropriate.
  • The standard error calculation involves estimating the true standard deviation by substituting the sample proportion for the population proportion in the formula. Luckily, this works well in situations where the normal curve is appropriate [i.e. when np and n(1-p) are both bigger than 5].
  • A confidence Interval is only related to sampling variability. The probability that your interval captures the true population value could be much lower if your survey is biased (e.g. bad question wording, low response rate, etc...).

Example 9.2

We take a random sample of 50 households in order to estimate the percentage of all homes in the United States that have a refrigerator. It turns out that 49 of the 50 homes in our sample have a refrigerator. Can we use the formulas above to make a confidence interval in this situation?

Solution: No, in such a skewed situation- with only 1 home that does not have a refrigerator - the normal curve would be a very poor approximation to the distribution of sample proportions.

9.2 - Confidence Intervals for a Population Mean

9.2 - Confidence Intervals for a Population Mean

Example 9.3

Cash Transaction

Over the three-day period from April 1 to April 3, 2015, a national poll surveyed 1500 American households to gauge their levels of discretionary spending. The question asked was how much the respondent spent the day before; not counting the purchase of a home, motor vehicle, or normal household bills. For these sampled households, the average amount spent was \(\bar x\) = \$95 with a standard deviation of s = \$185.

How close will the sample average come to the population mean?

Let's follow the same reasoning as developed in section 9.2 for proportions. We have:

\[\text{Sample average} = \text{population mean} + \text{random error}\]

The Normal Approximation tells us that the distribution of these random errors over all possible samples follows the normal curve with a standard deviation of \(\frac{\sigma}{\sqrt{n}}\). Notice how the formula for the standard deviation of the average depends on the true population standard deviation \(\sigma\). When the population standard deviation is unknown, like in this example, we can still get a good approximation by plugging in the sample standard deviation (s). We call the resulting estimate the Standard Error of the Mean (SEM).

Standard Error of the Mean (SEM) = estimated standard deviation of the sample average =

\[\frac{\text{standard deviation of the sample}}{\sqrt{n}} = \frac{s}{\sqrt{n}}\]

In the example, we have s = \$185 so the Standard Error of the Mean =

\[\frac{\text{\$185}}{\sqrt{1500}} = \$4.78\]

Recap: the estimated daily amount of discretionary spending amongst American households at the beginning of April 2015 was \$95 with a standard error of \$4.78

The Normal Approximation tells us, for example, that for 95% of all large samples, the sample average will be within two SEM of the true population average.

Thus, a 95% confidence interval for the true daily discretionary spending would be \$95 ± 2(\$4.78) or\$95 ± \$9.56.

Of course, other levels of confidence are possible. When the sample size is large, s will be a good estimate of \(\sigma\) and you can use multiplier numbers from the normal curve. When the sample size is smaller (say n < 30), then s will be fairly different from \(\sigma\) for some samples - and that means that we need a bigger multiplier number to account for that. (see the optional material on "t-multipliers" in chapter 21).

Confidence Intervals for a population mean (n \(\ge\)) 30

For large random samples, a confidence interval for a population mean is given by

\[\text{sample mean} \pm z^* \frac{s}{\sqrt{n}}\]

where z* is a multiplier number that comes from the normal curve and determines the level of confidence (see Table 9.1 in section 9.2).

Example 9.4

Planet Jupiter

The equatorial radius of the planet Jupiter is measured 40 times independently by a process that is practically free of bias. These measurements average \(\bar x\) = 71492 kilometers with a standard deviation of s = 28 kilometers. Find a 90% confidence interval for the equatorial radius of Jupiter.

Note! Note that the equatorial radius of the planet is a fixed number (Jupiter is not changing in size). But measurements are random quantities that might come out different when repeated independently. If the measurement process is unbiased, then repeating the process many times and taking the average gives a better estimate of the true value.
Solution: since s = 28 km, the SEM = \(\frac{28}{\sqrt{40}}=4.4 km\). With n = 40, using the multiplier number from the normal curve for 90% confidence (z*=1.645) will work pretty well so our confidence interval would be: 71492 km ± 1.645(4.4 km) or 71492 km ± 7.3 km

Example 9.5

Credit Cards

How much credit card debt do students typically have when they graduate from Penn State University? A sample of 15 recent Penn State graduates is obtained. Each of these recent graduates is asked to indicate the amount of credit card debt they had at the time of graduation. It turns out that the sample mean was \(\bar x\) = \$2430 with a sample standard deviation of s = \$2300. Would it be appropriate to use the method above to find a 99% confidence interval for the average credit card debt for all recent Penn State graduates?

Solution: No, with n = 15, using s as an estimate of \(\sigma\) would add quite a bit of extra variability; so it would not be appropriate to use the normal curve multiplier associated with 99% confidence (z* = 2.576). Also, we can tell from the large value of s relative to the sample average that the data here are quite skewed and so the normal curve would not be a good approximation to the sampling distribution regardless.

9.3 - Confidence Intervals for the Difference Between Two Population Proportions or Means

9.3 - Confidence Intervals for the Difference Between Two Population Proportions or Means

When a sample survey produces a proportion or a mean as a response, we can use the methods in section 9.1 and section 9.2 to find a confidence interval for the true population values. In this section, we discuss confidence intervals for comparative studies. How do we assess the difference between two proportions or means when they come from a comparative observational study or experiment? To address this question, we first need a new rule.

Standard Error of a Difference

When two samples are independent of each other,

Standard Error for a Difference between two sample summaries =

\[\sqrt{(\text{standard error in first sample})^{2} + (\text{standard error in second sample})^{2}}\]

Example 9.6

Elderly person with wrinkles around their eyes

A medical researcher conjectures that smoking can result in the wrinkled skin around the eyes. The researcher recruited 150 smokers and 250 nonsmokers to take part in an observational study and found that 95 of the smokers and 105 of the nonsmokers were seen to have prominent wrinkles around the eyes (based on a standardized wrinkle score administered by a person who did not know if the subject smoked or not). Some results from the study are found in Table 9.2.

Table 9.2. Results of the Smoking and wrinkles study (example 9.6)

Smokers Nonsmokers
Sample Size 150 250
Sample Proportion with Prominent Wrinkles 95/150 = 0.63 105/250 = 0.42
Standard Error for Proportion \(\sqrt{\frac{0.63(0.37)}{150}} = 0.0394\) \(\sqrt{\frac{0.42(0.58)}{250}} = 0.0312\)

How do the smokers compare to the non-smokers? The difference between the two sample proportions is 0.63 - 0.42 = 0.21. We would like to make a CI for the true difference that would exist between these two groups in the population. So we compute

\[\text{Standard Error for Difference} = \sqrt{0.0394^{2}+0.0312^{2}} ≈ 0.05\]

If we think about all possible ways to draw a sample of 150 smokers and 250 non-smokers then the differences we'd see between sample proportions would approximately follow the normal curve. Thus, a 95% Confidence Interval for the differences between these two proportions in the population is given by:

\[\text{Difference Between the Sample Proportions} \pm z^*(\text{Standard Error for Difference})\]

or

\[0.21 \pm 2(0.05)\;\; \text{or}\;\; 0.21 \pm 0.1\]

Notice that this 95% confidence interval goes from 0.11 to 0.31. Since the interval does not contain 0, we see that the difference seen in this study was "significant."

Another way to think about whether the smokers and non-smokers have significantly different proportions with wrinkles is to calculate a 95% Confidence Interval for each group separately. For the smokers, we have a confidence interval of 0.63 ± 2(0.0394) or 0.63 ± 0.0788. The interval for smokers goes from about 0.55 up to 0.71. For the non-smokers, we have a confidence interval of 0.42 ± 2(0.0312) or 0.42 ± 0.0624. The interval for non-smokers goes from about 0.36 up to 0.48. The interval for the smokers (which starts at 0.55) and the interval for the non-smokers (which ends at 0.48) do not overlap - that is another sign that the differences seen in this study were "significant."

Statistical Significance and Confidence Intervals

  • If the two confidence intervals do not overlap, we can conclude that there is a statistically significant difference in the two population values at the given level of confidence; or alternatively
  • If the confidence interval for the difference does not contain zero, we can conclude that there is a statistically significant difference in the two population values at the given level of confidence.

The first rule is the "more conservative" one since there are some circumstances when the interval for the difference does not contain zero but there is some overlap in the individual confidence intervals.

Importantly, the formula for the standard deviation of a difference is for two independent samples. It would not apply to dependent samples like those gathered in a matched pairs study.

Example 9.7

Person pulling a large rope

A general rule used clinically to judge normal levels of strength is that a person's dominant hand should have about 10% higher grip strength than their non-dominant hand. The idea is that the preferential use of your dominant hand in everyday activities might act as a form of endurance training for the muscles of the hand resulting in the strength differential. If this theory about the underlying reason for the strength differential is true then there should be less of a difference in young children than in adults. Data from a study of 60 right-handed boys under 10 years old and 60 right-handed men aged 30-39 are shown in Table 9.3.

Table 9.3 Grip Strength (kilograms) Average and Standard Deviation by Hand and Age

  Boys < 10 years old (n=60) Men 30-39 years old (n=60)
Right Hand \(\bar x\) = 6.2 kg s = 2.1 kg \(\bar x\) =40.3 kg s = 9.3 kg
Left Hand \(\bar x\) =5.9 kg s = 2.2 kg \(\bar x\) =35.6 kg s = 8.8 kg
Difference \(\bar x\) =0.3 kg s = 0.8 kg \(\bar x\) =4.7 kg s = 3.6 kg

 

Is the grip strength in the right hand higher than the grip strength in the left hand for boys under 10 years old? We cannot compare the left-hand results and the right-hand results as if they were separate independent samples. This is a matched pairs situation since the results are highly correlated. Some boys will be stronger than others in both hands. Thus, the proper way to examine the disparity between right-hand strength and left-hand strength is to look at the differences between the two hands in each boy and then analyze the resulting data as a single sample (as discussed in section 9.3). Looking at these differences we see their average is 0.3 kg with a standard deviation of 0.8 kg.

Thus the SEM for these differences is \(\frac{0.8}{\sqrt{60}}=0.103\) and a 95% Confidence Interval for the average right-hand versus left hand strength differential in the population of boys is 0.3 kg ± 2(0.103) kg or 0.3 kg ± 0.206 kg. The interval goes from about 0.09 kg up to 0.51 kg.

Similarly for the men in the study the SEM for the right-left strength differential is\(\frac{3.6}{\sqrt{60}}=0.465\) and a 95% Confidence Interval for the average strength differential in the population of men is 4.7 kg ± 2(0.465) kg or 4.7 kg ± 0.93 kg. The interval goes from 3.77 kg up to 5.63 kg.

Finally, we want to examine the idea that the right-left strength differential will be different between the 30-39 year old men and the boys < 10. That comparison involves two independent samples of 60 people each. To find a confidence interval for the average difference between these two populations we compute

\[\text{Standard Error for Difference} = \sqrt{0.103^{2}+0.465^{2}} \approx 0.476\]

 

If we think about all possible ways to draw a sample of 60 boys under 10 and 600 men from 30-39, then the differences we'd see between sample means would approximately follow the normal curve. Thus a 95% Confidence Interval for the differences between these two means in the population is given by

\[\text{Difference Between the Sample Means} \pm z^*(\text{Standard Error for Difference})\]

or

\[4.7 - 0.3 \text{kg} \pm 2(0.476) \text{kg} \;\; \text{or}\;\; 4.4 \text{kg} \pm 0.95 \text{kg}\]

 

Notice that this 95% confidence interval goes from 3.45 kg up to 5.35 kg. Since the interval does not contain 0, we see that the difference between the adults and children seen in this study was "significant."


9.4 - Test Yourself!

9.4 - Test Yourself!

Think About It!

Select the answer you think is correct - then click the 'Check' button to see how you did.

Click the right arrow to proceed to the next question.  When you have completed all of the questions you will see how many you got right and the correct answers.


9.5 - Have Fun With It!

9.5 - Have Fun With It!

Have Fun With It!

cartoon about confidence intervals, "I got the instructions from my statistics professor.  He was 80% confidence that the true location of the restaurant was in this neighborhood."

J.B. Landers ©

True Value

Lyrics © by Alan Reifman.
May sing to the tune of "Moon Shadow" by Cat Stevens

Within your CI, you get the true value, true value, true value,
With 95%, you get the true value, true value, true value,

You get a sample statistic, a sample r, or sample M,
You then take plus-or-minus two (it’s really 1.96…), standard errors beyond your stat,
And within this new interval, we can be, so confident,
That the true value, mu or rho, will be somewhere… inside…, our confidence interval,

Within your CI, you get the true value, true value, true value,
With 95%, you get the true value, true value, true value...


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility