Lesson 9: Confidence Intervals

Lesson Overview Section

Let us revisit the Statistical Paradigm in Figure 9.1 below. Lesson 7 and Lesson 8 focused on probability which tells us what kinds of samples might arise from a known population. However, in most real-world problems, reasoning from the population to the sample is not the goal. Usually, we have data from a sample and want to know what can be inferred about the population. Lesson 9 concentrates on one statistical method for reasoning from the sample to the population (inference): Confidence Intervals.

Statistical Paradigm

[Not supported by viewer]
Probability

The rules of probability can tell us the likelihood of different types of samples that might arise from a particular population.

[Not supported by viewer]
Inference

We want to infer what parameter values are most consistent with the sample statistic at hand.

[Not supported by viewer]
Conclude

What does our knowledge of the parameter values tell us about the population?

[Not supported by viewer]
Describe and Compare

Data is collected from the samples and, with sample data in hand, we attempt to create statistical summaries and pictures that give the salient features of the data collected.

[Not supported by viewer]

Samples

[Not supported by viewer]

Statistical Summaries and Pictures

[Not supported by viewer]

Population

[Not supported by viewer]

Parameters

[Not supported by viewer]

Figure 9.1: Key Components of the Statistical Paradigm

We find that 55% of a random sample of adult Pennsylvanians support a plan to publicly fund the first two years of a college education. What can be said about the proportion of all Pennsylvania adults who support the plan?

We find that the average of 36 independent measurements of the mass of the asteroid Ceres come out to 9.46 times 1020 kilograms with a standard deviation of 1019 kilograms. What can be said about the true mass of Ceres?

We find that the size of human breast tumors implanted in mice shrink by an average of 0.6 cm3 with a standard deviation of 0.5 cm3 when treated with a proposed new cancer treatment although 30% of the mice had tumors that grew with the new treatment. What can be said about the percentage of all mice that would be helped by the new treatment and how well the treatment would shrink tumors on average in the population of mice?

This lesson describes how to use Confidence Intervals to examine these types of scientific questions.

  • A population mean is the numerical average of a variable in the entire population of interest. One example would be the average amount spent on dairy products by adult Americans in the previous year. The actual numerical value of a population mean would rarely be known.
  • A sample mean is the numerical average of the data for a variable in a sample. One example would be the average amount spent on dairy products in the previous year by the respondents to a sample survey. The value of the sample mean might be used to estimate an unknown population mean.
  • The Standard Error of a sample Mean (often abbreviated S.E.M.) is the standard deviation of the sampling distribution of a sample mean. In a random sample, it is estimated by \(s/\sqrt{n}\) (the sample standard deviation divided by the square root of the sample size).
  • A population mean difference is the difference between numerical averages of a variable for two different groups in the entire population of interest. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women of New York City. The actual numerical value of a population mean difference would rarely be known.
  • A sample mean difference is the difference between numerical averages of a variable for two different groups in a sample. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women in a sample of 100 New Yorkers. The value of the sample mean difference might be used to estimate an unknown population mean difference.

Objectives

After successfully completing this lesson, you should be able to:

  • Interpret confidence intervals for population values.
  • Find confidence intervals for population proportions and means using random samples.
  • Understand the key principles of estimation:
    • Confidence intervals are random quantities, varying from sample to sample.  Sometimes these random intervals cover the true population parameter and sometimes they don't.  The coverage probability (the chance that the interval covers the parameter) is called the confidence level.
    • There is a trade-off between confidence and reliability.  In order to achieve a higher level of confidence, you must be willing to accept a larger margin of error (a wider interval) or pay the price of a larger sample size.
    • The variability of a sample statistic decreases with the square root of the sample size.  For example, when the sample size is four times as large, the margin of error will be cut in half.
    • Formulas for making confidence intervals are based on the probabilities associated with the randomization used to collect the data.
  • Apply appropriate decision rules to determine whether or not there is a statistically significant difference between two population values.