##
Lesson Overview
Section* *

Let us revisit the Statistical Paradigm in Figure 9.1 below. Lesson 7 and Lesson 8 focused on **probability** which tells us what kinds of samples might arise from a known population. However, in most real-world problems, reasoning from the population to the sample is not the goal. Usually, we have data from a sample and want to know what can be inferred about the population. Lesson 9 concentrates on one statistical method for reasoning from the sample to the population (**inference**): Confidence Intervals.

We find that 55% of a random sample of adult Pennsylvanians support a plan to publicly fund the first two years of a college education. What can be said about the proportion of all Pennsylvania adults who support the plan?

We find that the average of 36 independent measurements of the mass of the asteroid Ceres come out to 9.46 times 10^{20} kilograms with a standard deviation of 10^{19} kilograms. What can be said about the true mass of Ceres?

We find that the size of human breast tumors implanted in mice shrink by an average of 0.6 cm^{3} with a standard deviation of 0.5 cm^{3} when treated with a proposed new cancer treatment although 30% of the mice had tumors that grew with the new treatment. What can be said about the percentage of all mice that would be helped by the new treatment and how well the treatment would shrink tumors on average in the population of mice?

This lesson describes how to use Confidence Intervals to examine these types of scientific questions.

- A
**population mean**is the numerical average of a variable in the entire population of interest. One example would be the average amount spent on dairy products by adult Americans in the previous year. The actual numerical value of a**population mean**would rarely be known. - A
**sample mean**is the numerical average of the data for a variable in a sample. One example would be the average amount spent on dairy products in the previous year by the respondents to a sample survey. The value of the**sample mean**might be used to estimate an unknown population mean. - The
**Standard Error of a sample Mean**(often abbreviated**S.E.M.**) is the standard deviation of the sampling distribution of a sample mean. In a random sample, it is estimated by \(s/\sqrt{n}\) (the sample standard deviation divided by the square root of the sample size). - A
**population mean difference**is the difference between numerical averages of a variable for two different groups in the entire population of interest. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women of New York City. The actual numerical value of a**population mean difference**would rarely be known. - A
**sample mean difference**is the difference between numerical averages of a variable for two different groups in a sample. One example would be the difference between the average number of miles ridden on the subway last week for the men versus the women in a sample of 100 New Yorkers. The value of the**sample mean difference**might be used to estimate an unknown population mean difference.

## Objectives

- Interpret confidence intervals for population values.
- Find confidence intervals for population proportions and means using random samples.
- Understand the key principles of estimation:
- Confidence intervals are random quantities, varying from sample to sample. Sometimes these random intervals cover the true population parameter and sometimes they don't. The coverage probability (the chance that the interval covers the parameter) is called the confidence level.
- There is a trade-off between confidence and reliability. In order to achieve a higher level of confidence, you must be willing to accept a larger margin of error (a wider interval) or pay the price of a larger sample size.
- The variability of a sample statistic decreases with the square root of the sample size. For example, when the sample size is four times as large, the margin of error will be cut in half.
- Formulas for making confidence intervals are based on the probabilities associated with the randomization used to collect the data.

- Apply appropriate decision rules to determine whether or not there is a statistically significant difference between two population values.