6a.4  Hypothesis Test for OneSample Proportion
6a.4  Hypothesis Test for OneSample ProportionOverview
In this section, we will demonstrate how we use the sampling distribution of the sample proportion to perform the hypothesis test for one proportion.
Recall that if \(np \) and \(n(1p) \) are both greater than five, then the sample proportion, \(\hat{p} \), will have an approximate normal distribution with mean \(p \), standard error \(\sqrt{\frac{p(1p)}{n}} \), and the estimated standard error \(\sqrt{\frac{\hat{p}(1\hat{p})}{n}} \).
In hypothesis testing, we assume the null hypothesis is true. Remember, we set up the null hypothesis as \(H_0\colon p=p_0 \). This is very important! This statement says that we are assuming the unknown population proportion, \(p \), is equal to the value \(p_0 \).
Since this is true, then we can follow the same logic above. Therefore, if \(np_0 \) and \(n(1p_0) \) are both greater than five, then the sampling distribution of the sample proportion will be approximately normal with mean \(p_0 \) and standard error \(\sqrt{\frac{p_0(1p_0)}{n}} \).
We can find probabilities associated with values of \(\hat{p} \) by using:
\( z^*=\dfrac{\hat{p}p_0}{\sqrt{\dfrac{p_0(1p_0)}{n}}} \)
Example 64
Referring back to a previous example, say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5?
Is 0.556(=278/500) much bigger than 0.5? What is much bigger?
This depends on the standard deviation of \(\hat{p} \) under the null hypothesis.
\( \hat{p}p_0=0.5560.5=0.056 \)
The standard deviation of \(\hat{p} \), if the null hypothesis is true (e.g. when \(p_0=0.5\)) is:
\( \sqrt{\dfrac{p_0(1p_0)}{n}}=\sqrt{\dfrac{0.5(10.5)}{500}}=0.0224 \)
We can compare them by taking the ratio.
\( z^*=\dfrac{\hat{p}p_0}{\sqrt{\frac{p_0(1p_0)}{n}}}=\dfrac{0.5560.5}{\sqrt{\frac{0.5(10.5)}{500}}}=2.504 \)
Therefore, assuming the true population proportion is 0.5, a sample proportion of 0.556 is 2.504 standard deviations above the mean.
The \(z^*\) value we found in the above example is referred to as the test statistic.
 Test statistic
 The sample statistic one uses to either reject \(H_0 \) (and conclude \(H_a \) ) or fail to reject \(H_0 \).
6a.4.1  Making a Decision
6a.4.1  Making a DecisionIn the previous example for Penn State students, we found that assuming the true population proportion is 0.5, a sample proportion of 0.556 is 2.504 standard deviations above the mean, \(p \).
Is it far enough away from the 0.5 to suggest that there is evidence against the null? Is there a cutoff for the number of standard deviations that we would find acceptable?
What if instead of a cutoff, we found a probability? Recall the alternative hypothesis for this example was \(H_a\colon p>0.5 \). So if we found, for example, the probability of a sample proportion being 0.556 or greater, then we get \( P(Z>2.504)=0.0061 \).
This means that, if the true proportion is 0.5, the probability we would get a sample proportion of 0.556 or greater is 0.0061. Very small! But is it small enough to say there is evidence against the null?
To determine whether the probability is small or how many standard deviations are “acceptable”, we need a preset level of significance, which is the probability of a Type I error. Recall that a Type I error is the event of rejecting the null hypothesis when that null hypothesis is true. Think of finding guilty a person who is actually innocent.
When we specify our hypotheses, we should have some idea of what size of a Type I error we can tolerate. It is denoted as \(\alpha \). A conventional choice of \(\alpha \) is 0.05. Values ranging from 0.01 to 0.1 are also common and the choice of \(\alpha \) depends on the problem one is working on.
Once we have this preset level, we can determine whether or not there is significant evidence against the null. There are two methods to determine if we have enough evidence: the rejection region method and the pvalue method.
Rejection Region Approach
We start the hypothesis test process by determining the null and alternative hypotheses. Then we set our significance level, \(\alpha \), which is the probability of making a Type I error. We can determine the appropriate cutoff called the critical value and find a range of values where we should reject, called the rejection region.
 Critical values
 The values that separate the rejection and nonrejection regions.
 Rejection region
 The set of values for the test statistic that leads to rejection of \(H_0 \)
The graphs below show us how to find the critical values and the rejection regions for the three different alternative hypotheses and for a set significance level, \(\alpha \). The rejection region is based on the alternative hypothesis.
LeftTailed Test
Reject \(H_0\) if the test statistics is less than or equal to the critical value (\(c_\alpha\))
RightTailed Test
Reject \(H_0\) if the test statistic is greater than or equal to the critical value (\(c_{1\alpha}\))
TwoTailed Test
Reject \(H_0\) if the absolute value of the test statistic is greater than or equal to the absolute value of the critical value (\(c_{\alpha/2}\)).
The rejection region is the region where, if our test statistic falls, then we have enough evidence to reject the null hypothesis. If we consider the righttailed test, for example, the rejection region is any value greater than \(c_{1\alpha} \), where \(c_{1\alpha}\) is the critical value.
PValue Approach
As with the rejection region approach, the Pvalue approach will need the null and alternative hypotheses, the significance level, and the test statistic. Instead of finding a region, we are going to find a probability called the pvalue.
 Pvalue
 The pvalue (or probability value) is the probability that the test statistic equals the observed value or a more extreme value under the assumption that the null hypothesis is true.
The pvalue is a probability statement based on the alternative hypothesis. The pvalue is found differently for each of the alternative hypotheses.
 Lefttailed: If \(H_a \) is lefttailed, then the pvalue is the probability the sample data produces a value equal to or less than the observed test statistic.
 Righttailed: If \(H_a \) is righttailed, then the pvalue is the probability the sample data produces a value equal to or greater than the observed test statistic.
 Twotailed: If \(H_a \) is twotailed, then the pvalue is two times the probability the sample data produces a value equal to or greater than the absolute value of the observed test statistic.
So for onesample proportions we have...
\(P(Z \le z^*)\)
\(P(Z \ge z^*)\)
\(2\) x \(P(Z \ge z^*)\)
Once we find the pvalue, we compare the pvalue to our preset significance level.
 If our pvalue is less than or equal to \(\alpha \), then there is enough evidence to reject the null hypothesis.
 If our pvalue is greater than \(\alpha \), there is not enough evidence to reject the null hypothesis.
Caution! One should be aware that \(\alpha \) is also called level of significance. This makes for a confusion in terminology. \(\alpha \) is the preset level of significance whereas the pvalue is the observed level of significance. The pvalue, in fact, is a summary statistic which translates the observed test statistic's value to a probability which is easy to interpret.
Important note: We can summarize the data by reporting the pvalue and let the users decide to reject \(H_0 \) or not to reject \(H_0 \) for their subjectively chosen \(\alpha\) values.
This video will further explain the meaning of the pvalue.
Video: Understanding the PValue
6a.4.2  More on the PValue and Rejection Region Approach
6a.4.2  More on the PValue and Rejection Region ApproachTwo Methods for Making a Statistical Decision
Of the two methods for making a statistical decision, the pvalue approach is more commonly used and provided in published literature. However, understanding the rejection region approach can go a long way in one's understanding of the pvalue method. In the video, we show how the two methods are related. Regardless of the method applied, the conclusions from the two approaches are exactly the same.
Video: The Rejection Region vs the PValue Approach
Comparing the Two Approaches
Both approaches will ensure the same conclusion and either one will work. However, using the pvalue approach has the following advantages:
 Using the rejection region approach, you need to check the table or software for the critical value every time you use a different \(\alpha \) value.
 In addition to just using it to reject or not reject \(H_0 \) by comparing pvalue to \(\alpha \) value, the pvalue also gives us some idea of the strength of the evidence against \(H_0 \).
6a.4.3  Steps in Conducting a Hypothesis Test for \(p\)
6a.4.3  Steps in Conducting a Hypothesis Test for \(p\)
Six Steps for OneSample Proportion Hypothesis Test
Steps 13
Let's apply the general steps for hypothesis testing to the specific case of testing a onesample proportion.
 Step 1: Set up the hypotheses and check conditions.

\( np_0\ge 5 \) and \(n(1−p_0)≥5 \)
One Proportion Ztest Hypotheses
LeftTailed \( H_0\colon p=p_0 \)
 \( H_a\colon p<p_0\)
RightTailed \( H_0\colon p=p_0 \)
 \( H_a\colon p>p_0 \)
TwoTailed \( H_0\colon p=p_0 \)
 \( H_a\colon p\ne p_0 \)
 Step 2: Decide on the level of significance \(\boldsymbol{(\alpha)}\).
 Step 3: Calculate the test statistic.

One Proportion Ztest: \(z^*=\dfrac{\hat{p}p_0}{\sqrt{\frac{p_0(1p_0)}{n}}} \)
Rejection Region Approach
Steps 46
 Step 4: Find the appropriate critical values for the tests. Write down clearly the rejection region for the problem.

LeftTailed Test
RightTailed Test
TwoTailed Test
View the critical values and regions with an \(\alpha=.05\).
 Step 5: Make a decision about the null hypothesis.
 Check to see if the value of the test statistic falls in the rejection region. If it does, then reject \(H_0 \) (and conclude \(H_a \)). If it does not fall in the rejection region, do not reject \(H_0 \).
 Step 6: State an overall conclusion.
PValue Approach
Steps 46
 Step 4: Compute the appropriate pvalue based on our alternative hypothesis.

LeftTailed
 \(P(Z \le z^*)\)
RightTailed \(P(Z\ge z^*)\)
TwoTailed \(2\) x \(P(Z \ge z^*)\)
 Step 5: Make a decision about the null hypotheses.
 If the pvalue is less than the significance level, then reject the null hypothesis. If the pvalue is greater than the significance level, fail to reject the null hypothesis.
 Step 6: State an overall conclusion.
Example 65: Penn State Students from Pennsylvania
Referring back to example 64. Say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5 at a 5% level of significance?
Conduct the test using both the rejection region and pvalue approach.
 Step 1: Set up the hypotheses and check conditions.

Set up the hypotheses. Since the research hypothesis is to check whether the proportion is greater than 0.5 we set it up as a one (right)tailed test:
\( H_0\colon p=0.5 \) vs \(H_a\colon p>0.5 \)
Can we use the ztest statistic? The answer is yes since the hypothesized value \(p_0 \) is \(0.5\) and we can check that: \(np_0=500(0.5)=250 \ge 5 \) and \(n(1p_0)=500(10.5)=250 \ge 5 \)
 Step 2: Decide on the significance level, \(\alpha \).

According to the question, \(\alpha= 0.05 \).
 Step 3: Calculate the test statistic:

\begin{align} z^*&= \dfrac{0.5560.5}{\sqrt{\frac{0.5(10.5)}{500}}}\\z^*&=2.504 \end{align}
Rejection Region Approach
 Step 4: Find the appropriate critical values for the test using the ztable. Write down clearly the rejection region for the problem.

We can use the standard normal table to find the value of \(Z_{0.05} \). From the table, \(Z_{0.05} \) is found to be \(1.645\) and thus the critical value is \(1.645\). The rejection region for the righttailed test is given by:
\( z^*>1.645 \)
 Step 5: Make a decision about the null hypothesis.

The test statistic or the observed Zvalue is \(2.504\). Since \(z^*\) falls within the rejection region, we reject \(H_0 \).
 Step 6: State an overall conclusion.

With a test statistic of \(2.504\) and critical value of \(1.645\) at a 5% level of significance, we have enough statistical evidence to reject the null hypothesis. We conclude that a majority of the students are from Pennsylvania.
PValue Approach
 Step 4: Compute the appropriate pvalue based on our alternative hypothesis:
 \(\text{pvalue}=P(Z\ge z^*)=P(Z \ge 2.504)=0.0062\)
 Step 5: Make a decision about the null hypothesis.

Since \(\text{pvalue} = 0.0062 \le 0.05\) (the \(\alpha \) value), we reject the null hypothesis.
 Step 6: State an overall conclusion.

With a test statistic of \(2.504\) and pvalue of \(0.0062\), we reject the null hypothesis at a 5% level of significance. We conclude that a majority of the students are from Pennsylvania.
Try it!
Online Purchases
An ecommerce research company claims that 60% or more graduate students have bought merchandise online. A consumer group is suspicious of the claim and thinks that the proportion is lower than 60%. A random sample of 80 graduate students shows that only 22 students have ever done so. Is there enough evidence to show that the true proportion is lower than 60%?
Conduct the test at 10% Type I error rate and use the pvalue and rejection region approaches.
 Step 1: Set up the hypotheses and check conditions.

Set up the hypotheses. Since the research hypothesis is to check whether the proportion is less than 0.6 we set it up as a one (left)tailed test:
\( H_0\colon p=0.6 \) vs \(H_a\colon p<0.6 \)
Can we use the ztest statistic? The answer is yes since the hypothesized value \(p_0 \) is 0.6 and we can check that: \(np_0=80(0.6)=48 \ge 5 \) and \(n(1p_0)=80(10.6)=32 \ge 5 \)
 Step 2: Decide on the significance level, \(\alpha \).

According to the question, \(\alpha= 0.1 \).
 Step 3: Calculate the test statistic:

\begin{align} z^* &=\frac{\hat{p}p_0}{\sqrt{\frac{p_0(1p_0)}{n}}}\\&=\frac{.2750.6}{\sqrt{\frac{0.6(10.6)}{80}}}\\&=5.93 \end{align}
Rejection Region Approach
 Step 4: Find the appropriate critical values for the test using the ztable. Write down clearly the rejection region for the problem.

The critical value is the value of the standard normal where 10% fall below it. Using the standard normal table, we can see that the value is 1.28.
 Step 5: Make a decision about the null hypothesis.

The rejection region is any \(z^* \) such that \(z^*<1.28 \) . Since our test statistic, 5.93, is inside the rejection region, we reject the null hypothesis.
 Step 6: State an overall conclusion.

There is enough evidence in the data provided to suggest, at 10% level of significance, that the true proportion of students who made purchases online was less than 60%.
PValue Approach
 Step 4: Compute the appropriate pvalue based on our alternative hypothesis:
 \( \text{pvalue}=P(Z \le 5.93) = 0.0000000003 \)
 Step 5: Make a decision about the null hypothesis.

Since our pvalue is very small and less than our significance level of 10%, we reject the null hypothesis.
 Step 6: State an overall conclusion.

There is enough evidence in the data provided to suggest, at 10% level of significance, that the true proportion of students who made purchases online was less than 60%.