10.2 - Steps Used in a Hypothesis Test

Regardless of the type of hypothesis being considered, the process of carrying out a significance test is the same and relies on four basic steps:

  1. Step 1: State the null and alternative hypotheses

    State the null and alternative hypotheses (see section 10.1) Also think about the type 1 error (rejecting a true null) and type 2 error (declaring the plausibility of a false null) possibilities at this time and how serious each mistake would be in terms of the problem.

  2. Step 2: Collect and summarize the data

    Collect and summarize the data so that a test statistic can be calculated. A test statistic is a summary of the data that measures the difference between what is seen in the data and what would be expected if the null hypothesis were true. It is typically standardized so that a p-value can be obtained from a reference distribution like the normal curve.

  3. Step 3: Use the test statistic to find the p-value

    Use the test statistic to find the p-value. The p-value represents the likelihood of getting our test statistic or any test statistic more extreme if, in fact, the null hypothesis is true.

    • For a one-sided "greater than" alternative hypothesis, the "more extreme" part of the interpretation refers to test statistic values larger than the test statistic given.
    • For a one-sided "less than" alternative hypothesis, the "more extreme" part of the interpretation refers to test statistic values smaller than the test statistic given.
    • For a two-sided "not equal to" alternative hypothesis, the "more extreme" part of the interpretation refers to test statistic values that are farther away from the null hypothesis that the test statistic given at either the upper end or lower end of the reference distribution (both "tails").

     

  4. Step 4: Interpret the p-value

    Interpret what the p-value is telling you and make a decision using the p-value. Does the null hypothesis provide a reasonable explanation of the data or not? If not it is statistically significant and we have evidence favoring the alternative. State a conclusion in terms of the problem.

    Common Decision Rules seen in the literature

    • If the p-value ≤ .05, we often see scientists declare their data to be "significant."
    • If the p-value ≤ .01, we often see scientists declare their data to be "highly significant".
    • If the p-value > .05, we often see scientists declare their data to be "not significant".
    However, such cut-offs are arbitrary and we should not view data any differently when we see a p-value of 0.049 versus when we see a p-value of 0.051. There is no magic in the 0.05 value.

Example 10.9: Left Handed Artists: (continuation of example 10.2) Section

About 10% of the human population is left-handed. A researcher at Penn State speculates that students in the College of Arts and Architecture are more likely to be left-handed that people in the general population. A random sample of 100 students in the College of Arts and Architecture is obtained and 18 of these students were found to be left-handed.

Research Question: Are artists more likely to be left-handed than people in the general population?

  1. Step 1: State Null and Alternative Hypotheses
    • Null Hypothesis: Population proportion of left-handed students in the College of Art and Architecture = 0.10 (p = 0.10).
    • Alternative Hypothesis: Population proportion of left-handed students in the College of Art and Architecture > 0.10 (p > 0.10).

    Now that you know the null and alternative hypothesis, did you think about what the type 1 and type 2 errors are? It is important to note that Step 1 is before we even collect data. Identifying these errors helps to improve the design of your research study. Let's write them out:

    • Type 1 error: Claim artists are more likely to be left-handed than people in the general population when in truth they are not more likely.
    • Type 2 error: Fail to claim artists are more likely to be left-handed than people in the general population when they are in fact more likely.

    In this case, the consequences of these two errors are fairly similar (e.g. installing more or fewer left-handed desks in classrooms that are needed).

  2. Step 2: Collect and summarize the data so that a test statistic can be calculated.

    In the sample of 100 students listed above, the sample proportion is 18 / 100 = 0.18. The hypothesis test will determine whether or not the null hypothesis that p = 0.1 provides a plausible explanation for the data. If not we will see this as evidence that the proportion of left-handed Art & Architecture students is greater than 0.10.

    If the null hypothesis is true then the standard error of the sample proportion would be \(\sqrt{\frac{0.1(1-0.1)}{100}} = 0.03\) and the sample proportion would follow the normal curve. Thus, we can use the standard score z = (0.18-0.10) / 0.03 = 2.67 as our test statistic.

  3. Step 3: Use the test statistic to find the p-value.

    Using the normal curve table for the Z-value of 2.67 we find the p-value to be about 0.004. Notice that the one-sided alternative hypothesis says to watch out for large values so we look at the percentage of the normal curve above 2.67 to get the p-value.

    image of normal curve with z scores -2.67

    Interpretation of the p-value. The likelihood of getting our test statistic of 2.67 or any higher value, if in fact, the null hypothesis is true, is 0.004.

  4. Step 4: Make a decision using the p-value.

    Since the p-value of 0.004 is so small, the null hypothesis provides a very poor explanation of the data. We find good evidence that the population proportion of left-handed students in the College of Art and Architecture exceeds 0.10.

    Now that we have made our decision, we are only at risk of making a type 1 error. It is not possible at this point to make a type 2 error because we rejected the null hypothesis.

Example 10.10: The Weight of McDonald's French Fries in Japan Section

french fries

After receiving complaints from McDonald's customers in Japan about the amount of french fries being served, the online news magazine "Rocket News" decided to test the actual of the fries served at a particular Japanese McDonald's restaurant. According to the Rocket News article, the official weight standard set by McDonald's of Japan is for a medium-sized fries to weigh 135 grams. The publication weighed the fries from ten different medium fries they purchased and found the average weight of the fries in their sample to be 130 grams with a standard deviation of 9 grams.

Research Question: Does the data suggest that the medium fries from this McDonald's in Japan are underpacked?

  1. Step 1: State Null and Alternative Hypotheses.
    • Null Hypothesis: Population mean weight of medium fries = 135 grams
    • Alternative Hypothesis: Population mean weight of medium fries < 135 grams
  2. Step 2: Collect and summarize the data so that a test statistic can be calculated.

    The sample mean weight was 130 grams. Also, the sample standard deviation was 9 grams so the standard error of the mean is found to be \(\frac{9}{\sqrt{10}} = 2.85\) grams. The test statistic would be the standardized value (130-135) / 2.85 = -1.76.

  3. Step 3: Use the test statistic to find the p-value.

    Since the sample size is only 10, the sample standard deviation would be an unreliable estimate of the population standard deviation so the normal curve would not be appropriate to use as the reference distribution to find the p-value. In this case, the t curve would be used instead and it turns out that the percentage of a t-curve below -1.76 when you have a sample size of 10 is about 6%.

    image of normal curve with z scores -1.76

    Interpretation of the p-value. The likelihood of getting our test statistic of -1.76 or any smaller value, if in fact, the null hypothesis is true, is about 6%.

  4. Step 4: Make a decision using the p-value.

    Since the p-value is around 6% we are near the border of what people often use as a cutoff for declaring a significant result. Given the amount of variability from one package of fries to the next, there is a reasonable chance that we would see a sample average like this even if the restaurant met the official standard weight on average.

    It is important to remember in carrying out the mechanics of a significance test that you are only doing a probability calculation assuming the null hypothesis is true. Because the calculation is done under that assumption, it cannot say anything about the chances that the null hypothesis or the alternative hypothesis are true.