4.2 - Introduction to Confidence Intervals

In Lesson 4.1 we learned how to construct sampling distributions when population values were known. In real life, we don't typically have access to the whole population. In these cases we can use the sample data that we do have to construct a confidence interval to estimate the population parameter with a stated level of confidence. This is one type of statistical inference.

Confidence Interval: A range computed using sample statistics to estimate an unknown population parameter with a stated level of confidence

Example: Statistical Anxiety

The statistics professors at a university want to estimate the average statistics anxiety score for all of their undergraduate students. It would be too time consuming and costly to give every undergraduate student at the university their statistics anxiety survey. Instead, they take a random sample of 50 undergraduate students at the university and administer their survey.

Using the data collected from the sample, they construct a 95% confidence interval for the mean statistics anxiety score in the population of all university undergraduate students. They are using \(\bar{x}\) to estimate \(\mu\). If the 95% confidence interval for \(\mu\) is 26 to 32, then we could say, “we are 95% confident that the mean statistics anxiety score of all undergraduate students at this university is between 26 and 32.” In other words, we are 95% confident that \(26 \leq \mu \leq 32\). This may also be written as \(\left [ 26,32 \right ]\).

At the center of a confidence interval is the sample statistic, such as a sample mean or sample proportion. This is known as the point estimate. The width of the confidence interval is determined by the margin of error. The margin of error is the amount that is subtracted from and added to the point estimate to construct the confidence interval.

Point Estimate: Sample statistic that serves as the best estimate for a population parameter

Margin of Error: Half of the width of a confidence interval; equal to the multiplier times the standard error

General Form of Confidence Interval: \(sample\ statistic \pm margin\ of\ error\); \(margin\ of\ error=multiplier(standard\ error)\)

The margin of error will depend on two factors:

The level of confidence which determines the multiplier
The value of the standard error

In Lesson 2 you first learned about the Empirical Rule which states that approximately 95% of observations on a normal distribution fall within two standard deviations of the mean. Thus, when constructing a 95% confidence interval we can use a multiplier of 2.

General Form of 95% Confidence Interval: Given a normal distribution, a 95% CI can be found by using...; \(sample\ statistic\pm2\ (standard\ error)\)

Example: Proportion of Dog Owners

At the beginning of the Spring 2017 semester a representative sample of 501 STAT 200 students were surveyed and asked if they owned a dog. The sample proportion was 0.559. Bootstrapping methods, which we will learn later in this lesson, were used to compute a standard error of 0.022. Assume the bootstrap distribution is normally distributed. We can use this information to construct a 95% confidence interval for the proportion of all STAT 200 students who own a dog.

\(sample\ statistic\pm2\ (standard\ error)\)

0.559 ± 2(0.022)
0.559 ± 0.044
[0.515, 0.603]

I am 95% confident that the proportion of all STAT 200 students in Spring 2017 that own a dog is between 0.515 and 0.603.

Example: Mean Height

In a random sample of 525 Penn State World Campus students the mean height was 67.009 inches with a standard deviation of 4.462 inches. The standard error was computed to be 0.195. Construct a 95% confidence interval for the mean height of all Penn State World Campus students. Assume the bootstrap distribution is normally distributed.

\(sample\ statistic\pm2\ (standard\ error)\)

67.009 ± 2(0.195)
67.009 ± 0.390
[66.619, 67.399]

I am 95% confident that the mean height of all Penn State World Campus students is between 66.619 inches and 67.399 inches.

4.2.1 - Interpreting Confidence Intervals

Confidence intervals are often misinterpreted. The logic behind them may be a bit confusing. Remember that when we're constructing a confidence interval we are estimating a population parameter when we only have data from a sample. We don't know if our sample statistic is less than, greater than, or approximately equal to the population parameter. And, we don't know for sure if our confidence interval contains the population parameter or not.

For example, the correct interpretation of a 95% confidence interval, [L, U], is that "we are 95% confident that the [population parameter] is between [L] and [U]."

Fill in the population parameter with the specific language from the problem. The L represents the 'lower endpoint' of the CI and the U represents the 'upper endpoint.'

Example: Correlation Between Height and Weight

At the beginning of the Spring 2017 semester a sample of World Campus students were surveyed and asked for their height and weight. In the sample, Pearson's r = 0.487. A 95% confidence interval was computed of [0.410, 0.559].

Interpretation:

The correct interpretation of this confidence interval is that we are 95% confident that the correlation between height and weight in the population of all World Campus students is between 0.410 and 0.559.

Example: Seatbelt Usage

A sample of 12th grade females was surveyed about their seatbelt usage. A 95% confidence interval for the proportion of all 12th grade females who always wear their seatbelt was computed to be [0.612, 0.668].

Interpretation:

The correct interpretation of this confidence interval is that we are 95% confident that the proportion of all 12th grade females who always wear their seatbelt in the population is between 0.612 and 0.668.

Example: IQ Scores

A random sample of 50 students at one school was obtained and each selected student was given an IQ test. These data were used to construct a 95% confidence interval of [96.656, 106.422].

Interpretation:

The correct interpretation of this confidence interval is that we are 95% confident that the mean IQ score in the population of all students at this school is between 96.656 and 106.422.

4.2.2 - Applying Confidence Intervals

A confidence interval contains a range of acceptable estimates of the population parameter. Values in a confidence interval are reasonable estimates for the true population value. Values not in the confidence interval are not reasonable estimates for the population value.

Example: Correlation Between Height and Weight

Research question: Is there convincing evidence of a positive correlation between height and weight in the population of all World Campus students?

The entire confidence interval is greater than zero which means that all reasonable estimates of the population correlation are positive. Yes, there is convincing evidence of a positive correlation between height and weight in the population of all World Campus students.

Example: Seatbelt Usage

Research question: Is there convincing evidence that the proportion of all 12th grade females who always wear their seatbelt is different from 0.65?

The value of 0.65 is contained within our confidence interval. This means that 0.65 is a reasonable value of the population proportion. We cannot conclude that the population proportion is different from 0.65.

Example: IQ Scores

A random sample of 50 students at one school was obtained and each selected student was given an IQ test. These data were used to construct a 95% confidence interval of [96.656, 106.422].

Research question: Is there convincing evidence that the mean IQ score at this school is different from the known national average of 100?

The 95% confidence interval contains 100. This means that 100 is a reasonable estimate for the mean IQ score of students at this school. There is not enough evidence that the mean at this school is different from 100.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility