6 Hypothesis Testing
6.1 Part A: Hypothesis Testing for One-Sample Proportion
Overview
As mentioned before, methods of making inferences about parameters are either estimating the parameter or testing a hypothesis about the value of the parameter. In this lesson, we will introduce the concepts of hypothesis testing. Then we will discuss hypothesis testing for a population proportion. In the next Lesson, we discuss inference for the population mean.
Objectives
Upon completion of this lesson, you should be able to:
- Explain the concepts of hypothesis testing.
- Set up hypotheses.
- Perform hypothesis testing for a population proportion using the p-value approach and the rejection region approach.
- Use a confidence interval to draw a conclusion about a two-sided test.
6.1.1 Introduction to Hypothesis Testing
Basic Terms
The first step in hypothesis testing is to set up two competing hypotheses. The hypotheses are the most important aspect. If the hypotheses are incorrect, your conclusion will also be incorrect.
The two hypotheses are named the null hypothesis and the alternative hypothesis.
6.1 (Null hypothesis) The null hypothesis is typically denoted as \(H_0\). The null hypothesis states the “status quo”. This hypothesis is assumed to be true until there is evidence to suggest otherwise.
6.2 (Alternative hypothesis) The alternative hypothesis is typically denoted as \(H_a\) or \(H_1\). This is the statement that one wants to conclude. It is also called the research hypothesis.
The goal of hypothesis testing is to see if there is enough evidence against the null hypothesis. In other words, to see if there is enough evidence to reject the null hypothesis. If there is not enough evidence, then we fail to reject the null hypothesis.
Consider the following example where we set up these hypotheses.
Example 6.1
A man, Mr. Orangejuice, goes to trial and is tried for the murder of his ex-wife. He is either guilty or innocent. Set up the null and alternative hypotheses for this example.
Answer
Putting this in a hypothesis testing framework, the hypotheses being tested are:
- The man is guilty
- The man is innocent Let’s set up the null and alternative hypotheses.
Remember that we assume the null hypothesis is true and try to see if we have evidence against the null. Therefore, it makes sense in this example to assume the man is innocent and test to see if there is evidence that he is guilty.
The Logic of Hypothesis Testing
We want to know the answer to a research question. We determine our null and alternative hypotheses. Now it is time to make a decision.
The decision is either going to be…
- reject the null hypothesis or…
- fail to reject the null hypothesis.
Consider the following table. The table shows the decision/conclusion of the hypothesis test and the unknown “reality”, or truth. We do not know if the null is true or if it is false. If the null is false and we reject it, then we make the correct decision. If the null hypothesis is true and we fail to reject it, then we make the correct decision.
Decision | Reality | |
---|---|---|
\(H_0\) is true | \(H_0\) is false | |
Reject \(H_0\), (conclude \(H_a\)) | Correct decision | |
Fail to reject \(H_0\) | Correct decision |
So what happens when we do not make the correct decision?
When doing hypothesis testing, two types of mistakes may be made and we call them Type I errors and Type II errors. If we reject the null hypothesis when it is true, then we make a type I error. If the null hypothesis is false and we fail to reject it, we make another error called a Type II error.
Decision | Reality | |
---|---|---|
\(H_0\) is true | \(H_0\) is false | |
Reject \(H_0\), (conclude \(H_a\)) | Type I Error | Correct decision |
Fail to reject \(H_0\) | Correct decision | Type II Error |
Types of errors
6.3 (Type I error) When we reject the null hypothesis when the null hypothesis is true.
6.4 (Type II error) When we fail to reject the null hypothesis when the null hypothesis is false.
The “reality”, or truth, about the null hypothesis is unknown and therefore we do not know if we have made the correct decision or if we committed an error. We can, however, define the likelihood of these events.
6.5 (\(\alpha\) (‘Alpha’)) The probability of committing a Type I error. Also known as the significance level.
6.6 (\(\beta\) (‘Beta’)) The probability of committing a Type II error.
6.7 (Power) Power is the probability the null hypothesis is rejected given that it is false (ie. \(1-\beta\))
\(\alpha\) and \(\beta\) are probabilities of committing an error so we want these values to be low. However, we cannot decrease both. As \(\alpha\) decreases, \(\beta\) increases.
Example 6.2
Let’s return to our Mr. Orangejuice example. A man, Mr. Orangejuice, goes to trial and is tried for the murder of his ex-wife. He is either guilty or not guilty. We found before that…
\(H_a\colon\) Mr. Orangejuice is guilty
Interpret Type I error, \(\alpha\), Type II error, \(\beta\).
Answer
Type I Error:
Type I error is committed if we reject \(H_0\) when it is true. In other words, when the man is innocent but found guilty.
\(\alpha\):
\(\alpha\) is the probability of a Type I error, or in other words, it is the probability that Mr. Orangejuice is innocent but found guilty.
Type II Error:
Type II error is committed if we fail to reject \(H_0\) when it is false. In other words, when the man is guilty but found not guilty.
\(\beta\):
\(\beta\) is the probability of a Type II error, or in other words, it is the probability that Mr. Orangejuice is guilty but found not guilty.
As you can see here, the Type I error (putting an innocent man in jail) is the more serious error. Ethically, it is more serious to put an innocent man in jail than to let a guilty man go free. So to minimize the probability of a type I error we would choose a smaller significance level.
Try It!
An inspector has to choose between certifying a building as safe or saying that the building is not safe. There are two hypotheses:
- Building is safe
- Building is not safe
Set up the null and alternative hypotheses. Interpret Type I and Type II errors.
\(H_0\colon\) Building is not safe vs \(H_a\colon\) Building is safe
Decision | Reality | |
---|---|---|
\(H_0\) is true | \(H_0\) is false | |
Reject \(H_0\), (conclude \(H_a\)) | Reject "building is not safe" when it is not safe (Type I Error) | Correct decision |
Fail to reject \(H_0\) | Correct decision | Failing to reject 'building is not safe' when it is safe (Type II Error) |
Power and \(\beta\) are complements of each other. Therefore, they have an inverse relationship, i.e. as one increases, the other decreases.
It makes sense for us to set up the \(H_0\) and \(H_a\) as above (that is, assume the building is not safe until proven otherwise), because if we switch \(H_0\) and \(H_a\) (that is, if \(H_0\) was building is safe and \(H_a\) is building is not safe) and if we fail to reject \(H_0\), we cannot quite conclude that building is safe (we can only fail to reject \(H_0\), we cannot accept \(H_0\)).
6.1.2 Steps for Hypothesis Tests
A hypothesis, in statistics, is a statement about a population parameter, where this statement typically is represented by some specific numerical value. In testing a hypothesis, we use a method where we gather data in an effort to gather evidence about the hypothesis.
How do we decide whether to reject the null hypothesis?
- If the sample data are consistent with the null hypothesis, then we do not reject it.
- If the sample data are inconsistent with the null hypothesis, but consistent with the alternative, then we reject the null hypothesis and conclude that the alternative hypothesis is true.
Six Steps for Hypothesis Tests
In hypothesis testing, there are certain steps one must follow. Below these are summarized into six such steps to conducting a test of a hypothesis.
Set up the hypotheses and check conditions:
Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as \(H_0\), which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is evidence to suggest otherwise. The second hypothesis is called the alternative, or research hypothesis, notated as \(H_a\). The alternative hypothesis is a statement of a range of alternative values in which the parameter may fall. One must also check that any conditions (assumptions) needed to run the test have been satisfied e.g. normality of data, independence, and number of success and failure outcomes.Decide on the significance level, \(\alpha\):
This value is used as a probability cutoff for making decisions about the null hypothesis. This alpha value represents the probability we are willing to place on our test for making an incorrect decision in regard to rejecting the null hypothesis. The most common \(\alpha\) value is 0.05 or 5%. Other popular choices are 0.01 (1%) and 0.1 (10%).Calculate the test statistic:
Gather sample data and calculate a test statistic where the sample statistic is compared to the parameter value. The test statistic is calculated under the assumption the null hypothesis is true and incorporates a measure of standard error and assumptions (conditions) related to the sampling distribution.Calculate probability value (p-value), or find the rejection region:
A p-value is found by using the test statistic to calculate the probability of the sample data producing such a test statistic or one more extreme. The rejection region is found by using alpha to find a critical value; the rejection region is the area that is more extreme than the critical value. We discuss the p-value and rejection region in more detail in the next section.Make a decision about the null hypothesis:
In this step, we decide to either reject the null hypothesis or decide to fail to reject the null hypothesis. Notice we do not make a decision where we will accept the null hypothesis.State an overall conclusion:
Once we have found the p-value or rejection region, and made a statistical decision about the null hypothesis (i.e. we will reject the null or fail to reject the null), we then want to summarize our results into an overall conclusion for our test.
We will follow these six steps for the remainder of this Lesson. In future Lessons, the steps will be followed but may not be explained explicitly.
Step 1 is a very important step to set up correctly. If your hypotheses are incorrect, your conclusion will be incorrect. In this next section, we practice with Step 1 for the one sample situations.
6.1.3 Set-Up for One-Sample Hypotheses
We will continue our discussion by considering two specific hypothesis tests: a test of one proportion, and a test of one mean. We will provide the general set up of the hypothesis and the test statistics for both tests. From there, we will branch off into specific discussions on each of these tests.
In order to make a judgment about the value of a parameter, the problem can be set up as a hypothesis testing problem. We usually set the hypothesis that one wants to conclude as the alternative hypothesis, also called the research hypothesis.
Since hypothesis tests are about a parameter value, the hypotheses use parameter notation - \(p\) for proportion or \(\mu\) for mean - in their arrangement. For tests of a proportion or a test of a mean, we would choose the appropriate alternative based on our research question.
Below are the possible hypotheses from which we would select only one of them based on the research question. The symbols \(p_0\) and \(\mu_0\) are used in these general statements and in practice, get replaced by the parameter value, or constant, being tested.
One Sample Proportion
Research Question | Is the population proportion different from \(p_0\)? | Is the population proportion greater than \(p_0\)? | Is the population proportion less than \(p_0\)? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(p=p_0\) | \(p= p_0\) | \(p= p_0\) |
Alternative Hypothesis, \(H_{a}\) | \(p\neq p_0\) | \(p> p_0\) | \(p< p_0\) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
*\(p_{0}\) is the hypothesized population proportion
One Sample Mean
Research Question | Is the population mean different from \( \mu_{0}\)? | Is the population mean greater than \(\mu_{0}\)? | Is the population mean less than \(\mu_{0}\)? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(\mu=\mu_{0}\) | \(\mu=\mu_{0}\) | \(\mu=\mu_{0}\) |
Alternative Hypothesis, \(H_{a}\) | \(\mu\neq \mu_{0}\) | \(\mu> \mu_{0}\) | \(\mu<\mu_{0}\) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
*\(\mu_{0}\) is the hypothesized population mean
The most important step in hypothesis testing is choosing the correct parameter of interest and correctly setting up the alternative hypothesis.
Example 6.3 (Null and Alternative Hypotheses)
In each of the following scenarios, determine the parameter of interest and the null and alternative hypotheses.
When debating the State Appropriation for Penn State, the following question is asked: “Are the majority of students at Penn State from Pennsylvania?”
The response variable is ‘State’ and is a qualitative variable. Therefore, the parameter of interest would be \(p\) the population proportion of students from PA. The hypotheses should be in terms of \(p\). The value we are testing is the “majority” (50%) or \(p_0=0.5\). The majority also implies greater than 50%. Thus, the hypothesis set up would be a right-tailed test. \[H_0\colon p=0.5 \text{ vs. } H_a\colon p>0.5\]
A consumer test agency wants to see whether the mean lifetime of a brand of tires is less than 42,000 miles. The tire manufacturer advertises that the average lifetime is at least 42,000 miles.
The response variable here is ‘lifetime’ and is a quantitative variable. Therefore, set up the hypotheses in terms of \(\mu\). Here the value of \(\mu_0\) is 42,000. With the consumer test agency wanting to research that the mean lifetime is below 42,000, we would set up the hypotheses as a left-tailed test: \[H_0\colon \mu=42000 \text{ vs. } H_a\colon \mu<42000\]
The length of a certain lumber from a national home building store is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet.
The response variable is the ‘length of lumber’ and is quantitative. Therefore, we set up the hypotheses in terms of \(\mu\). Here the value of \(\mu_0\) is 8.5. With the builder wanting to check if the mean length is different from 8.5, she would set up the hypotheses as a two-tailed test: \[H_0\colon \mu=8.5 \text{ vs. } H_a\colon \mu\ne 8.5\]
A political news company believes the national approval rating for the current president has fallen below 40%.
The response variable here is ‘approval rating’ and is a qualitative variable. Therefore, we will set up the hypothesis in terms of \(p\). In this case, the \(p_0\) value is 0.4 and the hypotheses would be set up as a left-tailed test: \[H_0\colon p=0.4 \text{ vs. } H_a\colon p<0.4\]
6.1.4 Hypothesis Test for One-Sample Proportion
Overview
In this section, we will demonstrate how we use the sampling distribution of the sample proportion to perform the hypothesis test for one proportion.
Recall that if \(np\) and \(n(1-p)\) are both greater than five, then the sample proportion, \(\hat{p}\), will have an approximate normal distribution with mean \(p\), standard error \(\sqrt{\frac{p(1-p)}{n}}\), and the estimated standard error \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\).
In hypothesis testing, we assume the null hypothesis is true. Remember, we set up the null hypothesis as \(H_0\colon p=p_0\). This is very important! This statement says that we are assuming the unknown population proportion, \(p\), is equal to the value \(p_0\).
Since this is true, then we can follow the same logic above. Therefore, if \(np_0\) and \(n(1-p_0)\) are both greater than five, then the sampling distribution of the sample proportion will be approximately normal with mean \(p_0\) and standard error \(\sqrt{\frac{p_0(1-p_0)}{n}}\).
We can find probabilities associated with values of \(\hat{p}\) by using:
\[z^*=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\]
Example 6.4
Referring back to a previous example, say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5?
Is 0.556 (=278/500) much bigger than 0.5? What is much bigger?
Answer
This depends on the standard deviation of \(\hat{p}\) under the null hypothesis.
\[\hat{p}-p_0=0.556-0.5=0.056\]
The standard deviation of \(\hat{p}\), if the null hypothesis is true (e.g. when \(p_0=0.5\)) is:
\[\sqrt{\dfrac{p_0(1-p_0)}{n}}=\sqrt{\dfrac{0.5(1-0.5)}{500}}=0.0224\]
We can compare them by taking the ratio.
\[z^*=\dfrac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}=\dfrac{0.556-0.5}{\sqrt{\frac{0.5(1-0.5)}{500}}}=2.504\]
Therefore, assuming the true population proportion is 0.5, a sample proportion of 0.556 is 2.504 standard deviations above the mean.
The \(z^*\) value we found in the above example is referred to as the test statistic.
6.8 (Test statistic) The sample statistic one uses to either reject \(H_0\) (and conclude \(H_a\)) or fail to reject \(H_0\).
Making a Decision
In the previous example for Penn State students, we found that assuming the true population proportion is 0.5, a sample proportion of 0.556 is 2.504 standard deviations above the mean, \(p\).
Is it far enough away from the 0.5 to suggest that there is evidence against the null? Is there a cutoff for the number of standard deviations that we would find acceptable?
What if instead of a cutoff, we found a probability? Recall the alternative hypothesis for this example was \(H_a\colon p>0.5\). So if we found, for example, the probability of a sample proportion being 0.556 or greater, then we get \(P(Z>2.504)=0.0061\).
This means that, if the true proportion is 0.5, the probability we would get a sample proportion of 0.556 or greater is 0.0061. Very small! But is it small enough to say there is evidence against the null?
To determine whether the probability is small or how many standard deviations are “acceptable”, we need a preset level of significance, which is the probability of a Type I error. Recall that a Type I error is the event of rejecting the null hypothesis when that null hypothesis is true. Think of finding guilty a person who is actually innocent.
When we specify our hypotheses, we should have some idea of what size of a Type I error we can tolerate. It is denoted as \(\alpha\). A conventional choice of \(\alpha\) is 0.05. Values ranging from 0.01 to 0.1 are also common and the choice of \(\alpha\) depends on the problem one is working on.
Once we have this preset level, we can determine whether or not there is significant evidence against the null. There are two methods to determine if we have enough evidence: the rejection region method and the p-value method.
Rejection Region Approach
We start the hypothesis test process by determining the null and alternative hypotheses. Then we set our significance level, \(\alpha\), which is the probability of making a Type I error. We can determine the appropriate cutoff called the critical value and find a range of values where we should reject, called the rejection region.
6.9 (Critical values) The values that separate the rejection and non-rejection regions.
6.10 (Rejection region) The set of values for the test statistic that leads to rejection of \(H_0\)
The graphs below show us how to find the critical values and the rejection regions for the three different alternative hypotheses and for a set significance level, \(\alpha\). The rejection region is based on the alternative hypothesis.
The rejection region is the region where, if our test statistic falls, then we have enough evidence to reject the null hypothesis. If we consider the right-tailed test, for example, the rejection region is any value greater than \(c_{1-\alpha}\), where \(c_{1-\alpha}\) is the critical value.
Left-Tailed Test
Reject \(H_0\) if the test statistics is less than or equal to the critical value (\(c_\alpha\))
Right-Tailed Test
Reject \(H_0\) if the test statistic is greater than or equal to the critical value (\(c_{1-\alpha}\))
Two-Tailed Test
Reject \(H_0\) if the absolute value of the test statistic is greater than or equal to the absolute value of the critical value (\(c_{\alpha/2}\)).
P-Value Approach
As with the rejection region approach, the P-value approach will need the null and alternative hypotheses, the significance level, and the test statistic. Instead of finding a region, we are going to find a probability called the p-value.
6.11 (P-value) The p-value (or probability value) is the probability that the test statistic equals the observed value or a more extreme value under the assumption that the null hypothesis is true.
The p-value is a probability statement based on the alternative hypothesis. The p-value is found differently for each of the alternative hypotheses.
- Left-tailed: If \(H_a\) is left-tailed, then the p-value is the probability the sample data produces a value equal to or less than the observed test statistic.
- Right-tailed: If \(H_a\) is right-tailed, then the p-value is the probability the sample data produces a value equal to or greater than the observed test statistic.
- Two-tailed: If \(H_a\) is two-tailed, then the p-value is two times the probability the sample data produces a value equal to or greater than the absolute value of the observed test statistic.
So for one-sample proportions, we have…
Left-Tailed
\(P(Z \le z^*)\)
Right-Tailed
\(P(Z \ge z^*)\)
Two-Tailed
\(2\) x \(P(Z \ge |z^*|)\)
Once we find the p-value, we compare the p-value to our preset significance level.
- If our p-value is less than or equal to \(\alpha\), then there is enough evidence to reject the null hypothesis.
- If our p-value is greater than \(\alpha\), there is not enough evidence to reject the null hypothesis.
Important Note!
We can summarize the data by reporting the p-value and let the users decide to reject \(H_0\) or not to reject \(H_0\) for their subjectively chosen \(\alpha\) values.
This video will further explain the meaning of the p-value.
More on the P-Value and Rejection Region Approach
Two Methods for Making a Statistical Decision
Of the two methods for making a statistical decision, the p-value approach is more commonly used and provided in published literature. However, understanding the rejection region approach can go a long way in one’s understanding of the p-value method. In the video, we show how the two methods are related. Regardless of the method applied, the conclusions from the two approaches are exactly the same.
Comparing the Two Approaches
Both approaches will ensure the same conclusion and either one will work. However, using the p-value approach has the following advantages:
- Using the rejection region approach, you need to check the table or software for the critical value every time you use a different \(\alpha\) value.
- In addition to just using it to reject or not reject \(H_0\) by comparing the p-value to the \(\alpha\) value, the p-value also gives us some idea of the strength of the evidence against \(H_0\).
Steps in Conducting a Hypothesis Test for \(p\)
Six Steps for One-Sample Proportion Hypothesis Test
Steps 1-3
Let’s apply the general steps for hypothesis testing to the specific case of testing a one-sample proportion.
Step 1: Set up the hypotheses and check conditions.
\[np_0\ge 5 \text{ and }(n(1−p_0)≥5\]
One Proportion Z-test Hypotheses
Left-Tailed
\(H_0\colon p=p_0\)
\(H_a\colon p<p_0\)
Right-Tailed
\(H_0\colon p=p_0\)
\(H_a\colon p>p_0\)
Two-Tailed
\(H_0\colon p=p_0\)
\(H_a\colon p\ne p_0\)
Step 2: Decide on the level of significance \(\boldsymbol{(\alpha)}\). Typically, 5%. If \(\alpha\) is not specified, use 5%.
Step 3: Calculate the test statistic.
One Proportion Z-test: \[z^*=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\]
The first few steps (Step 1 - Step 3) are exactly the same as the rejection region or p-value approach. The next part will discuss steps 4 - 6 for both approaches.
Rejection Region Approach
Steps 4-6
Step 4: Find the appropriate critical values for the tests. Write down clearly the rejection region for the problem.
Reject \(H_0\) if \(z^* \le z_\alpha\)
Reject \(H_0\) if \(z^* \ge z_{1-\alpha}\)
Reject \(H_0\) if \(|z^*| \ge |z_{\alpha/2}|\)
View the critical values and regions with an \(\alpha=.05\).
Step 5: Make a decision about the null hypothesis.
Check to see if the value of the test statistic falls in the rejection region. If it does, then reject \(H_0\) (and conclude \(H_a\)). If it does not fall in the rejection region, do not reject \(H_0\).
Step 6: State an overall conclusion.
P-Value Approach
Steps 4-6
Step 4: Compute the appropriate p-value based on our alternative hypothesis.
Left-Tailed
\(P(Z \le z^*)\)
Right-Tailed
\(P(Z \ge z^*)\)
Two-Tailed
\(2\) x \(P(Z \ge |z^*|)\)
Step 4: Step 5: Make a decision about the null hypotheses.
If the p-value is less than the significance level, then reject the null hypothesis. If the p-value is greater than the significance level, fail to reject the null hypothesis.
Step 6: State an overall conclusion.
Example 6.5 (Penn State Students from Pennsylvania)
Referring back to Example 6.4. Say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5 at a 5% level of significance?
Conduct the test using both the rejection region and p-value approach.
Answer
Step 1: Set up the hypotheses and check conditions.
Set up the hypotheses. Since the research hypothesis is to check whether the proportion is greater than 0.5 we set it up as a one (right)-tailed test:
\[H_0\colon p=0.5 \text{ vs. } H_a\colon p>0.5\]
Can we use the z-test statistic? The answer is yes since the hypothesized value \(p_0\) is \(0.5\) and we can check that: \(np_0=500(0.5)=250 \ge 5\) and \(n(1-p_0)=500(1-0.5)=250 \ge 5\)
Step 2: Decide on the significance level, \(\alpha\).
According to the question, \(\alpha= 0.05\).
Step 3: Calculate the test statistic:
\[\begin{align} z^*&= \dfrac{0.556-0.5}{\sqrt{\frac{0.5(1-0.5)}{500}}}\\z^*&=2.504 \end{align}\]
Step 4: Find the appropriate critical values for the test using the z-table. Write down clearly the rejection region for the problem.
We can use the standard normal table to find the value of \(Z_{0.05}\). From the table, \(Z_{0.05}\) is found to be \(1.645\) and thus the critical value is \(1.645\). The rejection region for the right-tailed test is given by:
\[z^*>1.645\]
Step 5: Make a decision about the null hypothesis.
The test statistic or the observed Z-value is \(2.504\). Since \(z^*\) falls within the rejection region, we reject \(H_0\).
Step 6: State an overall conclusion.
With a test statistic of \(2.504\) and a critical value of \(1.645\) at a 5% level of significance, we have enough statistical evidence to reject the null hypothesis. We conclude that a majority of the students are from Pennsylvania.
Step 4: Compute the appropriate p-value based on our alternative hypothesis:
\(\text{p-value}=P(Z\ge z^*)=P(Z \ge 2.504)=0.0062\)
Step 5: Make a decision about the null hypothesis.
Since \(\text{p-value} = 0.0062 \le 0.05\) (the \(\alpha\) value), we reject the null hypothesis.
Step 6: State an overall conclusion.
With a test statistic of \(2.504\) and p-value of \(0.0062\), we reject the null hypothesis at a 5% level of significance. We conclude that a majority of the students are from Pennsylvania.
Try It!
An e-commerce research company claims that 60% or more graduate students have bought merchandise online. A consumer group is suspicious of the claim and thinks that the proportion is lower than 60%. A random sample of 80 graduate students shows that only 22 students have ever done so. Is there enough evidence to show that the true proportion is lower than 60%?
Conduct the test at a 10% Type I error rate and use the p-value and rejection region approaches.
Step 1: Set up the hypotheses and check conditions.
Set up the hypotheses. Since the research hypothesis is to check whether the proportion is less than 0.6 we set it up as a one (left)-tailed test:
\[H_0\colon p=0.6 \text{ vs. } H_a\colon p<0.6\]
Can we use the z-test statistic? The answer is yes since the hypothesized value \(p_0\) is 0.6 and we can check that: \(np_0=80(0.6)=48 \ge 5\) and \(n(1-p_0)=80(1-0.6)=32 \ge 5\)
Step 2: Decide on the significance level, \(\alpha\).
According to the question, \(\alpha= 0.1\).
Step 3: Calculate the test statistic:
\[\begin{align} z^* &=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\\&=\frac{.275-0.6}{\sqrt{\frac{0.6(1-0.6)}{80}}}\\&=-5.93 \end{align}\]
Step 4: Find the appropriate critical values for the test using the z-table. Write down clearly the rejection region for the problem.
The critical value is the value of the standard normal where 10% falls below it. Using the standard normal table, we can see that the value is -1.28.
Step 5: Make a decision about the null hypothesis.
The rejection region is any \(z^*\) such that \(z^*<-1.28\). Since our test statistic, -5.93, is inside the rejection region, we reject the null hypothesis.
Step 6: State an overall conclusion.
There is enough evidence in the data provided to suggest, at a 10% level of significance, that the true proportion of students who made purchases online was less than 60%.
Step 4: Compute the appropriate p-value based on our alternative hypothesis:
\(\text{p-value}=P(Z \le -5.93) = 0.0000000003\)
Step 5: Make a decision about the null hypothesis.
Since our p-value is very small and less than our significance level of 10%, we reject the null hypothesis.
Step 6: State an overall conclusion.
There is enough evidence in the data provided to suggest, at a 10% level of significance, that the true proportion of students who made purchases online was less than 60%.
6.1.5 Relating the CI to a Two-Tailed Test
The primary purpose of a confidence interval is to estimate some unknown parameter. A secondary use of confidence intervals is to support decisions in hypothesis testing, especially when the test is two-tailed. The essence of this method is to compare the hypothesized value to the confidence interval. If the hypothesized value falls within the interval, we fail to reject the null hypothesis. If the hypothesized value falls outside the interval, we reject the null hypothesis.
For the two-tailed test:
\[H_0 \colon p=p_0 \text{ vs. } H_a \colon p\ne p_0\]
The null hypothesis will be rejected at level \(\alpha\) if and only if the value \(p_0\) does not fall within the \((1 - \alpha)\) confidence interval for \(p\).
Let’s look at an example.
Example 6.6
Consider Example 5.3. A random sample of 1500 U.S. adults is taken. They are asked whether they approve or disapprove of the current president’s performance so far (i.e. an approval rating). Of the 1500 surveyed, 660 responded with “approve”.
The 95% confidence interval found in Lesson 5 for the population proportion who approve of the president’s performance so far is (0.415, 0.465).
Suppose we want to test if the proportion is different than 40%. In other words, we want to test the following hypotheses at a significance level of 5%.
Answer
Step 1: Set up the hypotheses and check conditions.
\[H_0\colon p=0.40 \text{ vs. } H_a \colon p\ne0.4\]
Step 2: Decide on the significance level, \(\alpha\).
Since we want to compare the 95% confidence interval, we should use a significance level of 5%
Step 3: Calculate the test statistic:
\[\begin{align} z^*&=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\\&=\frac{\frac{660}{1500}-0.4}{\sqrt{\frac{0.4(1-0.4)}{1500}}}\\&=3.162 \end{align}\]
Step 4: Compute the appropriate p-value based on our alternative hypothesis: (In this step we can use the rejection region approach or the p-value approach. We will demonstrate the p-value approach.)
For the two-sided test, the p-value is found by:
\[\begin{align} \text{p-value}&=2P(Z\ge z^*)\\&=2P(Z\ge3.162)\\&=2(0.00078)\\&=0.00156 \end{align}\]
Step 5: Make a decision about the null hypothesis.
The p-value of 0.00156 is less than our significance level of 5%. Therefore, we reject the null hypothesis.
Step 6: State an overall conclusion.
There is enough evidence in the data, at a significance level of 5%, to reject the null hypothesis and conclude that the true population proportion of people who approve of the president’s performance so far is different than 40%.
Connecting the CI with the 2-tailed test
The conclusion was to reject the null hypothesis that the true proportion is 40%. If we look at the 95% confidence interval for the test (0.415, 0.465), we can see that 40% is not inside that interval. Although we have not yet discussed a hypothesis test for the population mean, this idea applies to all two-sided tests and confidence intervals.
It is possible to use a one-sided confidence bound to draw a conclusion about a one-sided test, but you have to be very careful about obtaining the one-sided confidence bound.
6.1.6 Minitab: One-Sample \(p\) Hypothesis Testing
Minitab: Conduct a One-Sample Proportion Z-Test
To conduct the one-sample proportion Z-test in Minitab…
- Choose Stat > Basic Statistics > 1 Proportion….
- In the drop-down box use ‘One or more samples, each in a column’ if you have the raw data, otherwise select ‘Summarized data’ if you only have the sample statistics.
- If using the raw data enter the column of interest into the blank variable window below the drop down selection. If using summarized data enter the number of successes for Events and the sample size for Trials.
- Select the check box for ‘Perform hypothesis test’ and enter the null hypothesis value.
- Choose Options.
- Enter the confidence level associated with alpha (e.g. 95% for alpha of 5%).
- From the drop down list for Alternative hypothesis select the correct alternative.
- If conditions are satisfied to perform a z-test for one proportion, select from the Method field ‘Normal approximation’
- Choose OK and OK.
Example 6.7 (Penn State Students from Pennsylvania)
Recall Example 6.5 at the beginning of this lesson on whether the majority of Penn State students are from Pennsylvania. In that example, we took a random sample of 500 Penn State students and found that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5 at a 5% level of significance? Also, recall in that example we found by hand a test statistic of \(z^* = 2.504\) and p-value of 0.0062.
Our hypotheses were: \(H_0\colon p=0.5\) and \(H_a \colon p>0.5\)
Answer
Using Minitab…
- Choose Stat > Basic Statistics > 1 Proportion….
- Choose the summarized data option and enter 278 for Events and 500 as the Trials.
- Check the box for ‘Perform Hypothesis Test’ and enter the null value of 0.5.
- Choose Options. With our stated alpha value of 5% we keep the default confidence level of 95.
- Select Proportion > hypothesized proportion from the Alternative Hypothesis list. Since we verified that the conditions were satisfied, select ‘Normal Approximation’ under Method.
- Choose OK and OK.
The output is…
Sample | X | N | Sample p | 95% Lower Bound | Z-Value | P-Value |
---|---|---|---|---|---|---|
1 | 278 | 500 | 0.556000 | 0.519451 | 2.50 | 0.006 |
As the output indicates, our by-hand calculations were very accurate!
Minitab: Finding the Critical Value for a One-Sample Proportion Test
Although we can find values on the standard normal table, it is more accurate to find values using software. Finding values for the standard normal is discussed in more detail in Lesson 3. We present this here as a review. In order to obtain the exact critical value to use in order to conduct the rejection region approach we can use a statistical package such as Minitab.
To find the critical value…
- Choose Calc > Probability Distributions > Normal distribution
- Choose the radio button for ‘Inverse Cumulative Distribution’ (this finds the z-value that produces the entered probability to the left of it).
- Choose the radio button for ‘Input constant’ and enter the alpha value (if one-side alternative) or alpha/2 (if two-sided alternative).
- Choose OK.
6.1.7 Part A Summary
In this Lesson, we presented the logic and terminology of hypothesis testing. Then, we presented the six steps of hypothesis testing in statistics.
We developed the hypothesis test for one population proportion. The rejection region and the p-value approach were presented as ways to come to a conclusion.
Finally, we discussed how a two-sided test relates to a confidence interval.
In the next Lesson, we will present the statistical theory for a hypothesis test for a population mean from one sample
6.2 Part B: Hypothesis Testing for One-Sample Mean
Overview
In the previous Lesson, we learned how to perform a hypothesis test for one proportion. The concepts of hypothesis testing remain constant for any hypothesis test. In these next few sections, we will present the hypothesis test for one mean. We start with our knowledge of the sampling distribution of the sample mean.
Recall that under certain conditions, the sampling distribution of the sample mean, \(\bar{x}\), is approximately normal with mean, \(\mu\), standard error \(\frac{\sigma}{\sqrt{n}}\), and estimated standard error \(\frac{s}{\sqrt{n}}\).
The conditions are:
- The distribution of the population is Normal
- The sample size is large \(n\gt 30\).
If at least one of the conditions are satisfied, then…
\[t=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}\]
will follow a t-distribution with \(n-1\) degrees of freedom.
We can use this information to make probability statements for \(\bar{x}\).
Let’s look at an example.
Example 6.8 (Length of Lumber)
The mean length of the lumber is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a sample standard deviation of 1.2 feet. What will she conclude? Is 8.3 very different from 8.5?
Answer
This depends on the standard deviation of \(\bar{x}\) .
\[\begin{align} t^*&=\dfrac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}\\&=\dfrac{8.3-8.5}{\frac{1.2}{\sqrt{61}}}\\&=-1.3 \end{align}\]
Thus, we are asking if \(-1.3\) is very far away from zero since that corresponds to the case when \(\bar{x}\) is equal to \(\mu_0\). If it is far away, then it is unlikely that the null hypothesis is true and one rejects it. Otherwise, one cannot reject the null hypothesis.
How do we determine whether to reject the null hypothesis?
It depends on the level of significance \(\alpha\) (step 2 of conducting a hypothesis test), and the probability the sample data would produce the observed result. In the next section, we set up the six steps for a hypothesis test for one mean.
Objectives
Upon completion of this lesson, you should be able to:
- Perform hypothesis testing for a population mean using the p-value approach and the rejection region approach.
- Use confidence intervals to draw conclusions about two-sided tests.
6.2.1 Steps in Conducting a Hypothesis Test for \(\mu\)
Six Steps for One-Sample Mean Hypothesis Test
Steps 1-3
Let’s apply the general steps for hypothesis testing to the specific case of testing a one-sample mean.
Step 1: Set up the hypotheses and check conditions.
One Mean t-test Hypotheses
Left-Tailed
\(H_0\colon \mu=\mu_0\)
\(H_a\colon \mu<\mu_0\)
Right-Tailed
\(H_0\colon \mu=\mu_0\)
\(H_a\colon \mu>\mu_0\)
Two-Tailed
\(H_0: \mu=\mu_0\)
\(H_a: \mu\ne \mu_0\)
Conditions: The data comes from an approximately normal distribution, or the sample size is at least 30.
Step 2: Decide on the level of significance \(\boldsymbol{(\alpha)}\).
Typically, 5%. If \(\alpha\) is not specified, use 5%.
Step 3: Calculate the test statistic.
One Mean t-test: \[t^*=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}\]
The first few steps (Step 1 - Step 3) are exactly the same as the rejection region or p-value approach. The next part will discuss steps 4 - 6 for both approaches.
Rejection Region Approach
Steps 4-6
Step 4: Find the appropriate critical values for the tests. Write down clearly the rejection region for the problem.
Reject \(H_0\) if \(t^* \le z_\alpha\)
Reject \(H_0\) if \(t^* \ge t_{1-\alpha}\)
Reject \(H_0\) if \(|t^*| \ge |t_{\alpha/2}|\)
Step 5: Make a decision about the null hypothesis.
Check to see if the value of the test statistic falls in the rejection region. If it does, then reject \(H_0\) (and conclude \(H_a\)). If it does not fall in the rejection region, do not reject \(H_0\).
Step 6: State an overall conclusion.
P-Value Approach
Steps 4-6
Step 4: Compute the appropriate p-value based on our alternative hypothesis.
- If \(H_a\) is right-tailed, then the p-value is the probability the sample data produces a value equal to or greater than the observed test statistic.
- If \(H_a\) is left-tailed, then the p-value is the probability the sample data produces a value equal to or less than the observed test statistic.
- If \(H_a\) is two-tailed, then the p-value is two times the probability the sample data produces a value equal to or greater than the absolute value of the observed test statistic.
Left-Tailed
\(P(t \le t^*)\)
Right-Tailed
\(P(t\ge t^*)\)
Two-Tailed
\(2 \times P(t \ge |t^*|)\)
Step 4: Step 5: Make a decision about the null hypotheses.
If the p-value is less than the significance level, \(\alpha\), then reject \(H_0\) (and conclude \(H_a\)). If it is greater than the significance level, then do not reject \(H_0\).
Step 6: State an overall conclusion.
Example 6.9 (Length of Lumber)
Continuing with Example 6.8, the mean length of the lumber is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a sample standard deviation of 1.2 feet, what will she conclude? Conduct this test at a 1% level of significance.
Conduct the test using the Rejection Region approach and the p-value approach.
Answer
Step 1: Set up the hypotheses and check conditions.
Set up the hypotheses (since the research hypothesis is to check whether the mean is different from 8.5, we set it up as a two-tailed test):
\[H_0\colon \mu=8.5 \text{ vs. } H_a\colon \mu\ne 8.5\]
Can we use the t-test? The answer is yes since the sample size of 61 is sufficiently large (greater than 30).
Step 2: Decide on the significance level, \(\alpha\).
According to the question, \(\alpha = 0.01\).
Step 3: Calculate the test statistic.
\[\begin{align} t^*&=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}\\&=\dfrac{8.3-8.5}{\frac{1.2}{\sqrt{61}}}\\&=-1.3 \end{align}\]
Steps 4-6
Step 4: Find the appropriate critical values for the tests. Write down clearly the rejection region for the problem.
From the table and with degrees of freedom of 61-1=60, the critical value is \(t_{\alpha/2}=t_{0.005}=2.660\). The rejection region for the two-tailed test is given by: \[t^*\le -2.660 \text{ or } t^*\ge 2.660\]
*Recall how to use to Minitab or t-table to find the t percentiles (Lesson 5.4)
Step 5: Make a decision about the null hypothesis.
The observed t-value, or test statistic, is -1.3. Since \(t^*\) does not fall within the rejection region, we fail to reject \(H_0\).
Step 6: State an overall conclusion.
With a test statistic of -1.3 and critical value of ± 2.660 at a 1% level of significance, we do not have enough statistical evidence to reject the null hypothesis. We conclude that there is not enough statistical evidence that indicates that the mean length of lumber differs from 8.5 feet.
Step 4: Compute the appropriate p-value based on our alternative hypothesis.
\[\begin{align} \text{p-value}&=2P(T>|t^*|)\\&=2P\left(T>\left|\frac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}\right|\right)\\&=2P\left(T>\left|\frac{8.3-8.5}{\frac{1.2}{\sqrt{61}}}\right|\right)\\&=2P(T>|-1.3|)\\&=2P(T>1.3) \end{align}\] From the t-table going across the row for 60 degrees of freedom, we do not find a value equal to 1.3. Without software to find a more exact probability, the best we can do from the t-table is find a range. We do see that the value falls between 1.296 and 1.671. These two t-values correspond to right-tail probabilities of 0.1 and 0.05, respectively. Since 1.3 is between these two t-values, then it stands to reason that the probability to the right of 1.3 would fall between 0.05 and 0.1. Therefore, the p-value would be = 2×(0.05 and 0.1) or from 0.1 to 0.2.
Step 5: Make a decision about the null hypothesis.
With this range of possible p-values exceeding our 1% level of significance for the test, we fail to reject the null hypothesis.
Step 6: State an overall conclusion.
With a test statistic of - 1.3 and p-value between 0.1 to 0.2, we fail to reject the null hypothesis at a 1% level of significance since the p-value would exceed our significance level. We conclude that there is not enough statistical evidence that indicates that the mean length of lumber differs from 8.5 feet.
Try It!
Emergency Room Wait Time
The administrator at your local hospital states that on weekends the average wait time for emergency room visits is 10 minutes. Based on discussions you have had with friends who have complained on how long they waited to be seen in the ER over a weekend, you dispute the administrator’s claim. You decide to test your hypothesis. Over the course of a few weekends, you record the wait time for 40 randomly selected patients. The average wait time for these 40 patients is 11 minutes with a standard deviation of 3 minutes.
Do you have enough evidence to support your hypothesis that the average ER wait time exceeds 10 minutes? You opt to conduct the test at a 5% level of significance.
Step 1: Set up the hypotheses and check conditions.
At this point, we want to check whether we can apply the central limit theorem. The sample size is greater than 30, so we should be okay.
This is a right-tailed test.
\[H_0\colon \mu=10 \text{ .vs } H_a\colon \mu>10\]
Step 2: Decide on the significance level, \(\alpha\).
The problem states that \(\alpha=0.05\).
Step 3: Calculate the test statistic.
\[\begin{align} t^*&=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}\\&=\dfrac{11-10}{\frac{3}{\sqrt{40}}}\\&=2.11 \end{align}\]
Steps 4-6
Step 4: Find the appropriate critical values for the tests. Write down clearly the rejection region for the problem.
The degrees of freedom for this test are \(n-1=40-1=39\). The alternative is right-tailed. Therefore, we want to find the value, \(t_{0.05}\), such that \(P(T\ge t_{0.05})=0.05\).
Using the table from the text, it shows 35 and 40 degrees of freedom. We would use 35 degrees of freedom. With \(\alpha=0.05\) , we see a value of 1.69. The critical value is 1.69 and the rejection region is any \(t^*\) such that \(t^*\ge 1.69\).
Step 5: Make a decision about the null hypothesis.
Our test statistic, 2.11, is greater than our critical value of 1.69 and therefore is in the rejection region. We would reject the null hypothesis.
Step 6: State an overall conclusion.
There is enough evidence, at a significance level of 5%, to reject the null hypothesis and conclude that the mean waiting time is greater than 10 minutes.
Step 4: Compute the appropriate p-value based on our alternative hypothesis.
Again, using the table with 35 degrees of freedom, our test statistic is 2.11 and is between 2.030 and 2.438. This corresponds to a p-value between 0.01 and 0.025.
Step 5: Make a decision about the null hypothesis.
Since our p-value is between 0.01 and 0.025, we know it is less than our significance level, 5%. Therefore, we reject the null hypothesis.
Step 6: State an overall conclusion.
There is enough evidence, at a significance level of 5%, to reject the null hypothesis and conclude that the mean waiting time is greater than 10 minutes.
6.2.2 Minitab: One-Sample Mean Hypothesis Test
Note that these steps are very similar to those for one-mean confidence interval. The differences occur in steps 4 through 8.
To conduct the one sample mean t-test in Minitab…
- Choose Stat > Basic Stat > 1 Sample t…
- In the drop-down box use ‘One or more samples, each in a column’ if you have the raw data, otherwise select ‘Summarized data’ if you only have the sample statistics.
- If using the raw data, enter the column of interest into the blank variable window below the drop down selection. If using summarized data, enter the sample size, sample mean, and sample standard deviation in their respective fields.
- Choose the check box for Perform hypothesis test and enter the null hypothesis value.
- Choose Options.
- Enter the confidence level associated with alpha (e.g. 95% for alpha of 5%).
- From the drop down list for Alternative hypothesis select the correct alternative.
- Choose OK and OK.
Example 6.10
Recall our emergency room wait time example where an administrator at your local hospital states that on weekends the average wait time for emergency room visits is 10 minutes. From our random sample of 40 patients, the average wait time for these 40 patients was 11 minutes with a standard deviation of 3 minutes. We conducted the test at a 5% level of significance and wanted to demonstrate that the average time exceeded 10 minutes. Also, recall in that example we found by hand a test statistic of t* = 2.11 and p-value with a range between 0.01 to 0.025
Our hypotheses were: \(H_0 \colon \mu=10\) and \(H_a\colon \mu>10\)
Conduct the same test using Minitab.
Answer
Using Minitab…
- Select Stat > Basic Stat > 1 Sample t…
- Choose the summarized data option and enter 40 for Sample size, 11 for the Sample mean, and 3 for the Standard deviation.
- Check the box for Perform Hypothesis Test and enter the null value of 10.
- Select Options.
- With our stated alpha value of 5% we keep the default confidence level of 95.
- Select ‘Mean > hypothesized mean’ from the Alternative Hypothesis list.
- Select OK and OK again.
The output is:
Descriptive Statistics
N | Mean | StDev | SE Mean | 95% Lower Bound for μ |
---|---|---|---|---|
40 | 11.000 | 3.000 | 0.474 | 10.201 |
\(\mu\): population mean of Sample
Test
Alternative hypothesis \(H_1\colon \mu > 10\)
T-Value | P-Value |
---|---|
2.11 | 0.021 |
Again, as the output indicates, our hand calculations were quite good. Notice that Minitab provides a more exact p-value of 0.021 which corresponds to our results as it falls within our calculated range of 0.01 to 0.025.
Minitab: Finding Exact Critical Value for a One-Sample Mean t-Test
Since the t-table is not as detailed as the z-table, we can only estimate the critical value when the degrees of freedom are not found on the table. In order to obtain the exact critical value to use in order to conduct the rejection region approach, we can use a statistical package such as Minitab.
Minitab commands to obtain critical value:
- Select Calc > Probability Distributions
- Choose Inverse Cumulative Distribution Function
- Enter ‘A single value’ for the Form of input:
- For the Value: enter your alpha value (if one-side alternative) or alpha/2 (if two-sided alternative).
- For Distribution: choose t
- Enter the correct Degrees of freedom:
- Choose ‘Display a table of inverse cumulative probabilities’ for Output
- Choose OK
Example 6.11 (Example 6.10 Cont’d…)
Find the exact critical value for our emergency room example. Recall by hand that we had to use the row with 35 degrees of freedom instead of the correct df of 39. In that example, our critical value for alpha of 5% was 1.69.
Answer
- Select Calc > Probability Distributions
- Choose Inverse Cumulative Distribution Function
- Enter ‘A single value’ for the Form of input:
- Set the Value: to 0.05
- For Distribution: choose t
- Set the Degrees of freedom: to 39
- Choose ‘Display a table of inverse cumulative probabilities’ for Output
- Choose OK
The output is as follows:
Student’s t distribution with 39 DF
P( X ≤ x ) | x |
---|---|
0.05 | -1.68488 |
This is where you need to be a little careful. Remember that our alternative was “greater than” or a right-tailed test. The output is the critical value for a left-tailed test. However, since the t-distribution is symmetrical, the area to the left of -1.68488 would be the same as the area to the right of 1.68488. Therefore, the critical value for our test with 39 degrees of freedom would be 1.68488, which doesn’t differ much from the 1.69 we estimated using 35 degrees of freedom. This is why the table skips going one by one after 30; there is little difference between the values when increasing by only one degree of freedom.
6.2.3 Further Considerations for Hypothesis Testing
In this section, we include a little more discussion about some of the issues with hypothesis tests and items to be concious about.
Committing an Error
Every time we make a decision and come to a conclusion, we must keep in mind that our decision is based on probability. Therefore, it is possible that we made a mistake.
Consider the example of the previous Lesson on whether the majority of Penn State students are from Pennsylvania. In that example, we took a random sample of 500 Penn State students and found that 278 are from Pennsylvania. We rejected the null hypothesis, at a significance level of 5% with a p-value of 0.006.
The significance level of 5% means that we have a 5% chance of committing a Type I error. That is, we have a 5% chance that we rejected a true null hypothesis.
If we failed to reject a null hypothesis, then we could have committed a Type II error. This means that we could have failed to reject a false null hypothesis.
How Important are the Conditions of a Test?
In our six steps in hypothesis testing, one of them is to verify the conditions. If the conditions are not satisfied, we can still calculate the test statistic and find the rejection region (or p-value). We cannot, however, make a decision or state a conclusion. The conclusion is based on probability theory.
If the conditions are not satisfied, there are other methods to help us make a conclusion. The conclusion, however, may be based on other parameters, such as the median. There are other tests (some are discussed in later lessons) that can be used.
Statistical and Practical Significances
Our decision in the emergency room waiting times example was to reject the null hypothesis and conclude that the average wait time exceeds 10 minutes. However, our sample mean of 11 minutes wasn’t too far off from 10. So what do you think of our conclusion? Yes, statistically there was a difference at the 5% level of significance, but are we “impressed” with the results? That is, do you think 11 minutes is really that much different from 10 minutes?
Since we are sampling data we have to expect some error in our results therefore even if the true wait time was 10 minutes it would be extremely unlikely for our sample data to have a mean of exactly 10 minutes. This is the difference between statistical significance and practical significance. The former is the result produced by the sample data while the latter is the practical application of those results.
Statistical significance is concerned with whether an observed effect is due to chance and practical significance means that the observed effect is large enough to be useful in the real world.
Critics of hypothesis-testing procedures have observed that a population mean is rarely exactly equal to the value in the null hypothesis and hence, by obtaining a large enough sample, virtually any null hypothesis can be rejected. Thus, it is important to distinguish between statistical significance and practical significance.
The Relationship Between Power, \(\beta\), and \(\alpha\)
Recall that \(\alpha\) is the probability of committing a Type I error. It is the value that is preset by the researcher. Therefore, the researcher has control over the probability of this type of error. But what about \(\beta\), the probability of a Type II error? How much control do we have over the probability of committing this error? Similarly, we want power, the probability we correctly reject a false null hypothesis, to be high (close to 1). Is there anything we can do to have a high power?
The relationship between power and \(\beta\) is an inverse relationship, namely…
\[\text{Power} =1-\beta\]
If we increase power, then we decrease \(\beta\). But how do increase power? One way to increase the power is to increase the sample size.
Relationship between \(\alpha\) and \(\beta\):
If the sample size is fixed, then decreasing \(\alpha\) will increase \(\beta\), and therefore decrease power. If one wants both \(\alpha\) and \(\beta\) to decrease, then one has to increase the sample size.
It is possible, using software, to find the sample size required for set values of \(\alpha\) and power. Also using software, it is possible to determine the value of power. We do not go into details on how to do this but you are welcome to explore on your own.
Gathering data is like tasting fine wine—you need the right amount. With wine, too small a sip keeps you from accurately assessing a subtle bouquet, but too large a sip overwhelms the palate.
We can’t tell you how big a sip to take at a wine-tasting event, but when it comes to collecting data, software tools can tell you how much data you need to be sure about your results.
6.2.4 More Examples
As previously mentioned, setting up the hypotheses is the most important step. In this section, we provide some additional practice with examples where we do not indicate explicitly if it is a test for a mean or a proportion.
Try It! Checkout Time
Fresh N Friendly food store advertises that their checkout waiting times is four minutes or less. An angry customer wants to dispute this claim. He takes a random sample of shoppers at the peak time and records their checkout times. Can he dispute their claim at a significance level 10%?
Checkout times:
3.8, 5.3, 3.5, 4.5, 7.2, 5.1
Step 1: Set up the hypotheses and check conditions.
The response variable is waiting time and is quantitative. Therefore, the hypotheses should be in terms of the population mean.
\[H_0\colon \mu=4 \text{ vs. } H_a\colon \mu>4\]
The sample size is small, \(n=6\). There is also no indication in the problem that the waiting times follow a normal distribution. We can use the Normal Probability Plot to examine the data.
The data seem consistent with the Normal distribution and therefore it seems reasonable that the data come from a Normal distribution. We should use caution here, however. If the data do not come from a Normal distribution, the conclusion is not valid.
Step 2: Decide on the significance level, \(\alpha\).
The problem suggests that we use \(\alpha=0.10\).
Step 3: Calculate the test statistic:
In order to do so, we must first calculate the sample mean and sample standard deviation. The sample mean is \(\bar{x}=4.9\) and \(s=1.3282\). The test statistic is: \[\begin{align} \text{t}^*&=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}\\&=\dfrac{4.9-4}{\frac{1.3282}{\sqrt{6}}}\\&=1.6598 \end{align}\]
Step 4: Compute the appropriate p-value based on our alternative hypothesis.(p-value approach)
For this example, we will find the p-value. For extra practice, find the rejection region on your own. \[\begin{align} \text{p-value}&=P(T>t^*)\\&=P(T>1.6598)\\&=1-0.9211\\&=0.0789 \end{align}\] Where T is a t-distribution with \(n-1=6-1=5\) degrees of freedom.
Step 5: Make a decision about the null hypothesis.
Since our p-value of 0.0789 is less than our significance level of 10%, we reject the null hypothesis.
Step 6: State an overall conclusion.
At 10% significance, we have enough evidence in the data to dispute the store’s claim that the mean waiting time is less than four minutes.
Try It! Satisfaction Surveys
The CEO of a large computer company claims that 80 percent of his customers are “very satisfied” with the customer service they receive. To test this claim, the researcher surveyed 100 customers and 75 of them stated they were “very satisfied.” Based on these findings, can we reject the CEO’s hypothesis that 80% of the customers are very satisfied?
Step 1: Set up the hypotheses and check conditions.
The response is categorical so the hypotheses will be based on the population proportion. The claim, or the null, will be that the proportion is 0.8 and the alternative is that it is different than 0.8. In symbols, we have:
\[H_0\colon p=0.8 \text{ vs. } H_a\colon p \ne 0.8\] The conditions, \(np_0=100(0.8)\) and \(n(1-p_0)=100(1-0.8)\) are both greater than five. Therefore, we can continue with the one-proportion Z-test.
Step 2: Decide on the significance level, \(\alpha\).
The level of significance is not stated in the problem. If it is not stated, we typically assume it to be 5%.
Step 3: Calculate the test statistic:
The test statistic is: \[\begin{align} \text{z}^*&=\dfrac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\\&=\dfrac{\frac{75}{100}-0.8}{\sqrt{\frac{0.8(1-0.8)}{100}}}\\&=-1.25 \end{align}\]
Step 4: Compute the appropriate p-value based on our alternative hypothesis.(p-value approach)
For this example, we will use the p-value approach. You may want to find the critical value and rejection region for extra practice. \[\begin{align} \text{p-value}&=2P(Z>|z^*|)\\&=2P(Z>1.25)\\&=2(0.1056)\\&=0.2112 \end{align}\]
Step 5: Make a decision about the null hypothesis.
Since our p-value is greater than our significance level, we fail to reject the null hypothesis.
Step 6: State an overall conclusion.
At a significance level of 5%, there is not enough evidence in the data to suggest the population proportion of customers who are “very satisfied” is not equal to 80%.
Try It! Hotel Survey
There is a claim that about 10% of all men traveling on business bring a friend or a spouse. According to a survey by Rest Easy Hotel, 5% of the 40 men who are traveling for business purposes brought a friend or spouse. Can Rest Easy Hotel dispute this claim and conclude that it is not 10%?
Step 1: Set up the hypotheses and check conditions.
The response variable is categorical (bring spouse or not). Our hypotheses will be based on the population proportion.
\[H_0\colon p=0.10 \text{ vs. } H_a\colon p \ne 0.10\]
Before we proceed, we need to check our conditions. We need \(np_0>5\) and \(n(1-p_0)>5\) and in this case we have \(40(0.10)=4\) and \(40(0.9)=36\).
Since our conditions are not satisfied, we should not proceed with the Z-test for one proportion. It would be best to continue with exact methods. We leave out how to do this.
6.2.5 Part B Summary
The concepts, logic, and terminology of hypothesis testing can take some time to master. It is worth it! Hypothesis testing is a very powerful statistical tool.
In this lesson, we covered how to set up the null and alternative hypotheses and how we can conclude whether to reject the null hypothesis or fail to reject the null. We also discussed the types of errors we can make and their respective probabilities.
We discussed how to apply our knowledge of sampling distributions to develop a test for a population parameter. We show how to complete the six steps for hypothesis testing for the population mean and the population proportion.
Next, we will move on to situations where we compare more than one population parameter.