6.6 - Confidence Intervals & Hypothesis Testing

Confidence intervals and hypothesis tests are similar in that they are both inferential methods that rely on an approximated sampling distribution. Confidence intervals use data from a sample to estimate a population parameter. Hypothesis tests use data from a sample to test a specified hypothesis. Hypothesis testing requires that we have a hypothesized parameter. 

The simulation methods used to construct bootstrap distributions and randomization distributions are similar. One primary difference is a bootstrap distribution is centered on the observed sample statistic while a randomization distribution is centered on the value in the null hypothesis. 

In Lesson 4, we learned confidence intervals contain a range of reasonable estimates of the population parameter. All of the confidence intervals we constructed in this course were two-tailed. These two-tailed confidence intervals go hand-in-hand with the two-tailed hypothesis tests we learned in Lesson 5. The conclusion drawn from a two-tailed confidence interval is usually the same as the conclusion drawn from a two-tailed hypothesis test. In other words, if the the 95% confidence interval contains the hypothesized parameter, then a hypothesis test at the 0.05 \(\alpha\) level will almost always fail to reject the null hypothesis. If the 95% confidence interval does not contain the hypothesize parameter, then a hypothesis test at the 0.05 \(\alpha\) level will almost always reject the null hypothesis.

Example: Mean Section

This example uses the Body Temperature dataset built in to StatKey for constructing a bootstrap confidence interval and conducting a randomization test

Let's start by constructing a 95% confidence interval using the percentile method in StatKey:

  samples = 6000 mean = 98.261 std. error = 0.108 125 150 97.90 98.00 98.10 98.20 98.30 98.40 98.50 98.60 0.025 98.044 0.950 0.025 Bootstrap Dotplot of 75 100 50 25 0 98.261 98.474 Mean Left Tail Two - Tail Right Tail  

  

The 95% confidence interval for the mean body temperature in the population is [98.044, 98.474].

Now, what if we want to know if there is enough evidence that the mean body temperature is different from 98.6 degrees? We can conduct a hypothesis test. Because 98.6 is not contained within the 95% confidence interval, it is not a reasonable estimate of the population mean. We should expect to have a p value less than 0.05 and to reject the null hypothesis.

\(H_0: \mu=98.6\)

\(H_a: \mu \ne 98.6\)

  samples = 5000 mean = 98.601 std. error = 0.106 100 120 98.30 98.40 98.50 98.60 98.70 98.80 98.90 99.00 0.00080 0.998 0.00080 98.260 80 40 20 0 null = 98.6 98.941 98.6 Left Tail Two - Tail Right Tail Randomization Dotplot of . Null hypothesis: µ =  

\(p = 2*0.00080=0.00160\)

\(p \leq 0.05\), reject the null hypothesis

There is evidence that the population mean is different from 98.6 degrees. 

Selecting the Appropriate Procedure Section

The decision of whether to use a confidence interval or a hypothesis test depends on the research question. If we want to estimate a population parameter, we use a confidence interval. If we are given a specific population parameter (i.e., hypothesized value), and want to determine the likelihood that a population with that parameter would produce a sample as different as our sample, we use a hypothesis test. Below are a few examples of selecting the appropriate procedure. 

Example: Cheese Consumption Section

Research question: How much cheese (in pounds) does an average American adult consume annually? 

What is the appropriate inferential procedure? 

Cheese consumption, in pounds, is a quantitative variable. We have one group: American adults. We are not given a specific value to test, so the appropriate procedure here is a confidence interval for a single mean.

Example: Age Section

Research question: Is the average age in the population of all STAT 200 students greater than 30 years?

What is the appropriate inferential procedure? 

There is one group: STAT 200 students. The variable of interest is age in years, which is quantitative. The research question includes a specific population parameter to test: 30 years. The appropriate procedure is a hypothesis test for a single mean.

Try it! Section

For each research question, identify the variables, the parameter of interest and decide on the the appropriate inferential procedure.

  1. Research question: How strong is the correlation between height (in inches) and weight (in pounds) in American teenagers?

    There are two variables of interest: (1) height in inches and (2) weight in pounds. Both are quantitative variables. The parameter of interest is the correlation between these two variables.

    We are not given a specific correlation to test. We are being asked to estimate the strength of the correlation. The appropriate procedure here is a confidence interval for a correlation

  2. Research question: Are the majority of registered voters planning to vote in the next presidential election?

    The parameter that is being tested here is a single proportion. We have one group: registered voters. "The majority" would be more than 50%, or p>0.50. This is a specific parameter that we are testing. The appropriate procedure here is a hypothesis test for a single proportion.

  3. Research question: On average, are STAT 200 students younger than STAT 500 students?

    We have two independent groups: STAT 200 students and STAT 500 students. We are comparing them in terms of average (i.e., mean) age.

    If STAT 200 students are younger than STAT 500 students, that translates to \(\mu_{200}<\mu_{500}\) which is an alternative hypothesis. This could also be written as \(\mu_{200}-\mu_{500}<0\), where 0 is a specific population parameter that we are testing. 

    The appropriate procedure here is a hypothesis test for the difference in two means.

  4. Research question: On average, how much taller are adult male giraffes compared to adult female giraffes?

    There are two groups: males and females. The response variable is height, which is quantitative. We are not given a specific parameter to test, instead we are asked to estimate "how much" taller males are than females. The appropriate procedure is a confidence interval for the difference in two means.

  5. Research question: Are STAT 500 students more likely than STAT 200 students to be employed full-time?

    There are two independent groups: STAT 500 students and STAT 200 students. The response variable is full-time employment status which is categorical with two levels: yes/no.

    If STAT 500 students are more likely than STAT 200 students to be employed full-time, that translates to \(p_{500}>p_{200}\) which is an alternative hypothesis. This could also be written as \(p_{500}-p_{200}>0\), where 0 is a specific parameter that we are testing. The appropriate procedure is a hypothesis test for the difference in two proportions.

  6. Research question: Is there is a relationship between outdoor temperature (in Fahrenheit) and coffee sales (in cups per day)?

    There are two variables here: (1) temperature in Fahrenheit and (2) cups of coffee sold in a day. Both variables are quantitative. The parameter of interest is the correlation between these two variables.

    If there is a relationship between the variables, that means that the correlation is different from zero. This is a specific parameter that we are testing. The appropriate procedure is a hypothesis test for a correlation