5: Hypothesis Testing, Part 1

5: Hypothesis Testing, Part 1

Objectives

Upon completion of this lesson, you should be able to:

  • Identify and write null and alternative hypotheses
  • Describe randomization procedures
  • Determine p-values using randomization methods in StatKey and Minitab
  • Interpret p-values
  • Make conclusions on the basis of a p-value

In Lesson 4 we used data from samples to construct confidence intervals for population parameters. When constructing confidence intervals the population parameters were unknown and we were estimating them. In this lesson we will continue to study statistical inference, but here we will be focusing on testing specific hypotheses. Now, we have a hypothesized population parameter to test. This changes how we construct our sampling distribution. Instead of having a distribution centered on the observed sample statistic, we will construct a distribution centered on the hypothesized population parameter. 

This lesson corresponds to Sections 4.1, 4.2, and 4.3 in the Lock5 textbook.

Hypothesis tests use data from a sample to make an inference about the value of a population parameter. In this lesson we will be conducting hypothesis tests with the following parameters:

  Population Parameter Sample Statistic
Mean \(\mu\) \(\overline x\)
Difference in two means \(\mu_1 - \mu_2\) \(\overline x_1 - \overline x_2\)
Proportion \(p\) \(\widehat p\)
Difference in two proportions \(p_1 - p_2\) \(\widehat p_1 - \widehat p_2\)
Correlation \(\rho\) \(r\)
Slope (simple linear regression) \(\beta\) \(b\)

 

We can also conduct hypothesis tests with paired means. If data are paired, and the response variable is quantitative, then the outcome of interest is the mean difference. In a population this is \(\mu_d\) and in a sample \(\overline x_d\). We would first compute the differences for each case, then treat those differences as if they are the variable of interest and conduct a single sample mean test.


5.1 - Introduction to Hypothesis Testing

5.1 - Introduction to Hypothesis Testing

Previously we used confidence intervals to estimate unknown population parameters. We compared confidence intervals to specified parameter values and when the specific value was contained in the interval, we concluded that there was not sufficient evidence of a difference between the population parameter and the specified value. In other words, any values within the confidence intervals were reasonable estimates of the population parameter and any values outside of the confidence intervals were not reasonable estimates. Here, we are going to look at a more formal method for testing whether a given value is a reasonable value of a population parameter. To do this we need to have a hypothesized value of the population parameter. 

In this lesson we will compare data from a sample to a hypothesized parameter. In each case, we will compute the probability that a population with the specified parameter would produce a sample statistic as extreme or more extreme to the one we observed in our sample. This probability is known as the p-value and it is used to evaluate statistical significance.

p-value
Given that the null hypothesis is true, the probability of obtaining a sample statistic as extreme or more extreme than the one in the observed sample, in the direction of the alternative hypothesis 

A test is considered to be statistically significant when the p-value is less than or equal to the level of significance, also known as the alpha (\(\alpha\)) level. For this class, unless otherwise specified, \(\alpha=0.05\); this is the most frequently used alpha level in many fields. 

Sample statistics vary from the population parameter randomly. When results are statistically significant, we are concluding that the difference observed between our sample statistic and the hypothesized parameter is unlikely due to random sampling variation.


5.2 - Writing Hypotheses

5.2 - Writing Hypotheses

The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)).

Null Hypothesis
The statement that there is not a difference in the population(s), denoted as \(H_0\)
Alternative Hypothesis
The statement that there is some difference in the population(s), denoted as \(H_a\) or \(H_1\)

When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the direction of the test (non-directional, right-tailed or left-tailed), and (3) the value of the hypothesized parameter.

  1. At this point we can write hypotheses for a single mean (\(\mu\)), paired means(\(\mu_d\)), a single proportion (\(p\)), the difference between two independent means (\(\mu_1-\mu_2\)), the difference between two proportions (\(p_1-p_2\)), a simple linear regression slope (\(\beta\)), and a correlation (\(\rho\)). 
  2. The research question will give us the information necessary to determine if the test is two-tailed (e.g., "different from," "not equal to"), right-tailed (e.g., "greater than," "more than"), or left-tailed (e.g., "less than," "fewer than").
  3. The research question will also give us the hypothesized parameter value. This is the number that goes in the hypothesis statements (i.e., \(\mu_0\) and \(p_0\)). For the difference between two groups, regression, and correlation, this value is typically 0.

Hypotheses are always written in terms of population parameters (e.g., \(p\) and \(\mu\)).  The tables below display all of the possible hypotheses for the parameters that we have learned thus far. Note that the null hypothesis always includes the equality (i.e., =).

One Group Mean
Research Question Is the population mean different from \( \mu_{0} \)? Is the population mean greater than \(\mu_{0}\)? Is the population mean less than \(\mu_{0}\)?
Null Hypothesis, \(H_{0}\) \(\mu=\mu_{0} \) \(\mu=\mu_{0} \) \(\mu=\mu_{0} \)
Alternative Hypothesis, \(H_{a}\) \(\mu\neq \mu_{0} \) \(\mu> \mu_{0} \) \(\mu<\mu_{0} \)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
 
\( \mu_{0} \) is the hypothesized population mean
Paired Means
Research Question Is there a difference in the population? Is there a mean increase in the population? Is there a mean decrease in the population?
Null Hypothesis, \(H_{0}\) \(\mu_d=0 \) \(\mu_d =0 \) \(\mu_d=0 \)
Alternative Hypothesis, \(H_{a}\) \(\mu_d \neq 0 \) \(\mu_d> 0 \) \(\mu_d<0 \)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
 
A paired means test is comparable to conducting a one group mean test on the differences.  
One Group Proportion
Research Question Is the population proportion different from \(p_0\)? Is the population proportion greater than \(p_0\)? Is the population proportion less than \(p_0\)?
Null Hypothesis, \(H_{0}\) \(p=p_0\) \(p= p_0\) \(p= p_0\)
Alternative Hypothesis, \(H_{a}\) \(p\neq p_0\) \(p> p_0\) \(p< p_0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
 
\( p_{0} \) is the hypothesized population proportion
Difference between Two Independent Means
Research Question Are the population means different? Is the population mean in group 1 greater than the population mean in group 2? Is the population mean in group 1 less than the population mean in groups 2?
Null Hypothesis, \(H_{0}\) \(\mu_1=\mu_2\) \(\mu_1 = \mu_2 \) \(\mu_1 = \mu_2 \)
Alternative Hypothesis, \(H_{a}\) \(\mu_1 \ne \mu_2 \) \(\mu_1 \gt \mu_2 \) \(\mu_1 \lt \mu_2\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
 
Note: \(\mu_1 = \mu_2\) is equivalent to \(\mu_1-\mu_2=0\)
Difference between Two Proportions
Research Question Are the population proportions different? Is the population proportion in group 1 greater than the population proportion in groups 2? Is the population proportion in group 1 less than the population proportion in group 2?
Null Hypothesis, \(H_{0}\) \(p_1 = p_2 \) \(p_1 = p_2 \) \(p_1 = p_2 \)
Alternative Hypothesis, \(H_{a}\) \(p_1 \ne p_2\) \(p_1 \gt p_2 \) \(p_1 \lt p_2\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
 
Note: \(p_1=p_2\) is equivalent to \(p_1-p_2=0\)
Simple Linear Regression: Slope
Research Question Is the slope in the population different from 0? Is the slope in the population positive? Is the slope in the population negative?
Null Hypothesis, \(H_{0}\) \(\beta =0\) \(\beta= 0\) \(\beta = 0\)
Alternative Hypothesis, \(H_{a}\) \(\beta\neq 0\) \(\beta> 0\) \(\beta< 0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Correlation (Pearson's r)
Research Question Is the correlation in the population different from 0? Is the correlation in the population positive? Is the correlation in the population negative?
Null Hypothesis, \(H_{0}\) \(\rho=0\) \(\rho= 0\) \(\rho = 0\)
Alternative Hypothesis, \(H_{a}\) \(\rho \neq 0\) \(\rho > 0\) \(\rho< 0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional

5.2.1 - Examples

5.2.1 - Examples

Example: Rent

Research question: Is the average monthly rent of a one-bedroom apartment in State College, Pennsylvania less than \$900?

In this question we are comparing the mean of all State College one-bedroom apartments (i.e. \(\mu\)) to the value of \$900. This is a single sample mean test. We want to know if the population mean is less than \$900, so this is a left-tailed test. Our hypotheses are:

  • \(H_0:\mu=\$900\)
  • \(H_a: \mu < \$900\)

Example: IQ Scores

Research question: Is the average IQ score of all World Campus STAT 200 students higher than the national average of 100?

In this question we are comparing the mean of all World Campus STAT 200 students (i.e. \(\mu\)) to the given value of 100. This is a single sample mean test. We want to know if the population mean is greater than 100, so this is a right-tailed test. Our hypotheses are:

  • \(H_0:\mu = 100\)
  • \(H_a: \mu > 100\)

Example: Weight Loss

Research question: Do participants lose weight following a weight-loss intervention?

Data were collected from one group of participants before and after a weight-loss intervention. Data were paired by participant.  Assuming that \(x_1\) is an individual's weight before the intervention and \(x_2\) is their weight at the end of the study, if they lost weight then \(x_1-x_2\) would be a positive number (i.e., greater than 0). Thus, this is a right-tailed test. Because we are testing their mean difference, the parameter that we should write in our hypotheses is \(\mu_d\) where \(\mu_d\) is the mean weight change (before-after) in the population.

Our hypotheses are:

  • \(H_0: \mu_d=0\) 
  • \(H_a:\mu_d > 0 \)

Example: Gender of College of Science Students

Research question: Is the percent of students enrolled in Penn State's College of Science who identify as women different from 50%?

In this question we are comparing the proportion of all Penn State College of Science students (i.e. \(p\)) to the given value of 0.5. This is a single sample proportion test. We want to know if the population proportion is different from 0.5, so this is a two-tailed test. Our hypotheses are:

  • \(H_0:p=0.5\)
  • \(H_a: p ≠ 0.5\)

Example: Dog Ownership

Dog

Research question: Do the majority of all World Campus STAT 200 students own a dog?

If the majority of all students own a dog, then more than 50% own a dog. In this question we are comparing the population proportion for all World Campus STAT 200 students (i.e. \(p\)) to the value of 0.5. This is a single sampling proportion test. We want to know if the proportion is greater than 0.5, so this is a right-tailed test. Our hypotheses are:

  • \(H_0:p=0.5\)
  • \(H_a: p > 0.5\)

Example: Weights of Boys and Girls

Research question: In preschool, are the weights of boys and girls different?

We are comparing the weights of two independent groups: boys and girls. Weight is a quantitative variable so the parameter we are testing is \(\mu\). Our research question does not hypothesize which group has the larger weight, so this is a two-tailed test.  Our hypotheses are:

  • \(H_0: \mu_b = \mu_g\) 
  • \(H_a: \mu_b \ne \mu_g\)

Note: This is equivalent to \(H_0: \mu_b - \mu_g = 0\) and \(H_a: \mu_b - \mu_g \ne 0\). 

Example: Smoking by Gender

Smoker

Research question: Is the proportion of men who smoke cigarettes different from the proportion of women who smoke cigarettes in the United States?

In this question we are comparing two independent groups: men and women. The response variable, smoking, is categorical therefore we are comparing proportions. Our research question does not suggest which group smokes more, so we have a two-tailed test. Our hypotheses are:

  • \(H_0: p_1=p_2\) 
  • \(H_a: p_1 \ne p_2\)

Note: This is equivalent to \(H_0: p_1 - p_2 =0\) and \(H_a: p_1 - p_2 \ne 0\)

Example: Predicting SAT-Math using IQ

Research question: Can IQ scores be used to predict SAT-Math scores in the population of all American high school seniors?

SAT-Math and IQ scores are both quantitative variables.  Our research question is about prediction, so we are going to use simple linear regression. The parameter we are testing is \(\beta\). Our research question does not state whether we expect the slope to be positive or negative, therefore this is a two-tailed test. Our hypotheses are:

  • \(H_0: \beta = 0\)
  • \(H_a: \beta \ne 0\)

Example: Relation Between Height and Weight

Research question: Is there a positive relationship between height and weight in the population of all American adults age 25 and older?

The relationship between two quantitative variables is measured using correlation (Pearson's r). The parameter we are testing is \(\rho\). A positive relationship would be indicated by a positive correlation coefficient, therefore this is a right-tailed test. Our hypotheses are:

  • \(H_0: \rho = 0\)
  • \(H_a: \rho > 0\)

5.3 - Randomization Procedures

5.3 - Randomization Procedures

Like bootstrapping procedures, randomization procedures use resampling techniques to construct a sampling distribution that can be used to make inferences about the population. What makes a randomization distribution different is that it is constructed given that the null hypothesis is true. The randomization distribution will be centered on the value in the null hypothesis. 

StatKey can be used to construct a randomization distribution for a single mean, single proportion, difference in means, difference in proportions, the slope of a simple linear regression model, or a correlation (Pearson's r). Minitab can conduct a randomization test for a single mean, single proportion, or difference in means.

The video below walks through an example of using StatKey to construct a randomization distribution. It also looks ahead to the next section and uses that randomization distribution to determine the p-value. 

These are the steps that we will be using to conduct hypothesis tests this semester:

  1. Determine what type of test you need to conduct and write the hypotheses.
  2. Construct a randomization distribution under the assumption that the null hypothesis is true.
  3. Use the randomization distribution to find the p-value.
  4. Decide if you should reject or fail to reject the null hypothesis.
  5. State a real-world conclusion in relation to the original research question.

Here, you learned how to complete Step 2. On the next page you will learn how to use this randomization distribution to complete Steps 3 through 5. 


5.3.1 - StatKey Randomization Methods (Optional)

5.3.1 - StatKey Randomization Methods (Optional)

The following information goes beyond what you are expected to know for this course. Here, details about all of the randomization procedure options available in StatKey are covered. In STAT 200 you will always be using the default randomization methods. The information here is optional and is meant to provide extra details to individuals who are interested in learning more, beyond what is required of most introductory statistics courses. 

Randomization Test for One Mean

In StatKey there is only one method for conducting a randomization test for one mean. The sample is shifted so that the sample mean equals the hypothesized population mean (i.e., the value in the null hypothesis). Samples of the same size as the original sample are drawn with replacement from the shifted distribution and the mean of each randomization sample is recorded on the randomization distribution dotplot.


Randomization Test for One Proportion

In StatKey there is only one method for conducting a randomization test for one proportion. Samples of the same size as the original sample are drawn from a theoretical distribution with a proportion equal to the hypothesized population proportion (i.e., the value in the null hypothesis).The sample proportion in each randomization sample is recorded on the randomization distribution dotplot. 


Randomization Test for a Difference in Means

StatKey offers three randomization methods when comparing the means of two independent groups: reallocate groups, shift groups, and combine groups. In this course we will always be using the default method of reallocating groups. For larger sample sizes results will be relatively consistent across the three methods. In practice, the method that is most appropriate may depend on the design of the research study. For example, the reallocation method may be preferred in studies where participants were randomly assigned to different conditions. 

  • Reallocate Groups
    This is the default method in StatKey. In this course, this is always the method that will be used. Using the reallocate method, all cases in the samples are combined and then randomly assigned to the two groups with the same sample sizes as the original samples. This is done without replacement. The mean of each reallocated sample is recorded. The difference between those reallocated sample's means is recorded on the randomization distribution dotplot.
     
  • Shift Groups
    The two groups are shifted until their observed sample means are equal. This is similar to the method used for one sample mean. After the groups are shifted, cases are randomly selected from the first group, with replacement, until a randomization sample of the same size as the first group's original sample is obtained. This procedure is followed for the second group as well. The difference between the mean of the first group and the mean of the second group is recorded on the randomization distribution plot.
     
  • Combine Groups
    All cases in the samples are combined and then randomly selected with replacement. Again, the sample sizes for each group will be equal to each group's original sample size.  The difference between this combine groups method and the default reallocate groups method is that this method resamples with replacement so an original case can appear more than once in a group, in both groups, or not at all. 

Randomization Test for a Difference in Proportions

StatKey offers two randomization methods when comparing the proportions of two independent groups: reallocation and resampling. In this course we will be using the default reallocation method. 

  • Reallocation
    This procedure is the same as the reallocate groups procedure for two group means. All cases are combined and then randomly assigned to between the two groups with the same sample sizes as the original samples. This is done without replacement so the total number of successes between the two groups will always be equal to the total number of successes between the two groups. 
     
  • Resampling
    The two groups are combined and the overall observed proportion is computed. Samples of the same size as the original samples are drawn from a theoretical distribution with a proportion equal to the overall observed proportion. The differences between the sample proportions in the two randomization samples are recorded on the randomization distribution dotplot.  

Randomization Test for a Slope, Correlation

The randomization methods used for testing the slope and correlation are the same as both procedures involve two quantitative variables. In each case, the pairs of x and y variables are separated and randomly assigned to new pairs. The slope or correlation between those new pairs is computed and recorded on the randomization distribution plot. Like the other reallocation methods, this is done without replacement so each case's x value and y value are only selected once. 

Resources


5.4 - p-values

5.4 - p-values

We can use a randomization distribution to determine how likely our sample statistic is given that the null hypothesis is true. This probability is known as the p-value. The p-value is the proportion of samples on the randomization distribution that are more extreme than our observed sample in the direction of the alternative hypothesis. The p-value is compared to the alpha level (typically 0.05).

Making a Decision

If \(p > \alpha\) then we "fail to reject the null hypothesis" and conclude that there is not enough evidence of a difference in the population. This does not mean that the null hypothesis is true, it only means that we do not have sufficient evidence to say that it is likely false. These results are not statistically significant. 

If \(p \leq \alpha\) then we "reject the null hypothesis" and conclude that there is a difference in the population. These results are statistically significant. 


5.5 - Randomization Test Examples in StatKey

5.5 - Randomization Test Examples in StatKey

The following pages contain examples of conducting randomization tests using StatKey


5.5.1 - Single Proportion Example: PA Residency

5.5.1 - Single Proportion Example: PA Residency

This example uses data collected from World Campus STAT 200 students at the beginning of the Fall 2016 semester. You can download this Minitab file here: fall2016stdata.mpx

Research question: Are less than half of all World Campus STAT 200 students Pennsylvania residents?

This research question is asking if there is evidence that the population proportion is less than 0.50 which can be translated to the following hypotheses:

\(H_0: p=0.50\)

\(H_a: p \lt 0.50\)

Step 1: We are comparing the proportion in one group to 0.50. This is a one sample proportion test.

\(H_0: p=0.50\)

\(H_a: p \lt 0.50\)

Step 2: We used StatKey to construct a randomization distribution.

Step 3: \(p<0.001\)

Step 4: \(p \leq 0.05\), reject the null hypothesis

Step 5: There is evidence that the proportion of all World Campus STAT 200 students who are Pennsylvania residents is less than 0.50.


5.5.2 - Paired Means Example: Age

5.5.2 - Paired Means Example: Age

Research question: On average, are husbands older than their wives?

Step 1: The data are paired by couple. This is a paired means test.

\(H_0: \mu_d=0\)

\(H_a: \mu_d > 0\)

Step 2: We constructed a randomization distribution in StatKey using the built in dataset.

Step 3: \(p < 0.001\)

Step 4: Reject the null hypothesis

Step 5: There is evidence that in the population, on average, husbands are older than their wives. 


5.5.3 - Difference in Means Example: Exercise by Biological Sex

5.5.3 - Difference in Means Example: Exercise by Biological Sex

Do males and females differ in terms of how many hours per week they exercise? This example uses a dataset that is built in to StatKey.

Step 1: Hours exercised per week is a quantitative variable and we are comparing two independent groups. We should conduct a hypothesis test for the differences in means.

\(H_0: \mu_m = \mu_f\)

\(H_a: \mu_m \ne \mu_f\)

Step 2: We constructed the randomization distribution given that there is not a difference between the means of males and females.

Step 3: \(p = 0.114+0.114=0.228\)

Step 4: \(p>0.05\), we should fail to reject the null hypothesis

Step 5: There is not enough evidence that the mean number of hours per week exercised by males and females is different in the population. Our results are not statistically significant. 


5.5.4 - Correlation Example: Quiz & Exam Scores

5.5.4 - Correlation Example: Quiz & Exam Scores

Using the sample data in:

We want to know if there is evidence of a positive relationship between quiz scores and final exam scores in the population of all World Campus STAT 200 students. If there is a positive relationship, then the population correlation would be greater than zero. This can be translated to the following hypotheses:

\(H_0: \rho = 0\)

\(H_a: \rho > 0\)

Step 1: We are examining the relationship between two quantitative variables. We should compute and test Pearson's r which is a correlation coefficient.

\(H_0: \rho = 0\)

\(H_a: \rho > 0\)

Step 2: We constructed a randomization distribution given that the correlation in the population is 0.

Step 3: \(p<0.001\)

Step 4: \(p\leq 0.05\), we should reject the null hypothesis

Step 5: There is evidence of a positive relationship between quiz scores and final exam scores in the population of all World Campus STAT 200 students.


5.6 - Lesson 5 Summary

5.6 - Lesson 5 Summary

Objectives

Upon completion of this lesson, you should be able to:

  • Identify and write null and alternative hypotheses.
  • Describe randomization procedures.
  • Determine p-values using randomization methods in StatKey and Minitab.
  • Interpret p-values.
  • Make conclusions on the basis of a p-value.

Let's review the randomization test procedures that you learned in this lesson:

  1. Determine what type of test you need to conduct and write the hypotheses
  2. Construct a randomization distribution under the assumption that the null hypothesis is true
  3. Use the randomization distribution to find the p-value
  4. Decide if you should reject or fail to reject the null hypothesis (see below)
  5. State a real-world conclusion in relation to the original research question

If \(p>\alpha\) then we fail to reject the null hypothesis and there is not enough evidence to support the alternative hypothesis. These results are said to be not statistically significant. If \(p \le \alpha\) then we reject the null hypothesis and conclude that there is enough evidence to support the alternative hypothesis. These results are statistically significant. Unless otherwise stated, \(\alpha\) of 0.05 should be used. 

We will be using these same hypothesis testing steps in all of the remaining lessons. 


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility