6a.4.1 - Making a Decision

In the previous example for Penn State students, we found that assuming the true population proportion is 0.5, a sample proportion of 0.556 is 2.504 standard deviations above the mean, \(p \).

Is it far enough away from the 0.5 to suggest that there is evidence against the null? Is there a cutoff for the number of standard deviations that we would find acceptable?

What if instead of a cutoff, we found a probability? Recall the alternative hypothesis for this example was \(H_a\colon p>0.5 \). So if we found, for example, the probability of a sample proportion being 0.556 or greater, then we get \( P(Z>2.504)=0.0061 \).

This means that, if the true proportion is 0.5, the probability we would get a sample proportion of 0.556 or greater is 0.0061. Very small! But is it small enough to say there is evidence against the null?

To determine whether the probability is small or how many standard deviations are “acceptable”, we need a preset level of significance, which is the probability of a Type I error. Recall that a Type I error is the event of rejecting the null hypothesis when that null hypothesis is true. Think of finding guilty a person who is actually innocent.

When we specify our hypotheses, we should have some idea of what size of a Type I error we can tolerate. It is denoted as \(\alpha \). A conventional choice of \(\alpha \) is 0.05. Values ranging from 0.01 to 0.1 are also common and the choice of \(\alpha \) depends on the problem one is working on.

Once we have this preset level, we can determine whether or not there is significant evidence against the null. There are two methods to determine if we have enough evidence: the rejection region method and the p-value method.

Rejection Region Approach

We start the hypothesis test process by determining the null and alternative hypotheses. Then we set our significance level, \(\alpha \), which is the probability of making a Type I error. We can determine the appropriate cutoff called the critical value and find a range of values where we should reject, called the rejection region.

Critical values: The values that separate the rejection and non-rejection regions.

Rejection region: The set of values for the test statistic that leads to rejection of \(H_0 \)

The graphs below show us how to find the critical values and the rejection regions for the three different alternative hypotheses and for a set significance level, \(\alpha \). The rejection region is based on the alternative hypothesis.

The rejection region is the region where, if our test statistic falls, then we have enough evidence to reject the null hypothesis. If we consider the right-tailed test, for example, the rejection region is any value greater than \(c_{1-\alpha} \), where \(c_{1-\alpha}\) is the critical value.

Left-Tailed Test

Reject \(H_0\) if the test statistics is less than or equal to the critical value (\(c_\alpha\))

Right-Tailed Test

Reject \(H_0\) if the test statistic is greater than or equal to the critical value (\(c_{1-\alpha}\))

Two-Tailed Test

Reject \(H_0\) if the absolute value of the test statistic is greater than or equal to the absolute value of the critical value (\(c_{\alpha/2}\)).

P-Value Approach

As with the rejection region approach, the P-value approach will need the null and alternative hypotheses, the significance level, and the test statistic. Instead of finding a region, we are going to find a probability called the p-value.

P-value: The p-value (or probability value) is the probability that the test statistic equals the observed value or a more extreme value under the assumption that the null hypothesis is true.

The p-value is a probability statement based on the alternative hypothesis. The p-value is found differently for each of the alternative hypotheses.

Left-tailed: If \(H_a \) is left-tailed, then the p-value is the probability the sample data produces a value equal to or less than the observed test statistic.
Right-tailed: If \(H_a \) is right-tailed, then the p-value is the probability the sample data produces a value equal to or greater than the observed test statistic.
Two-tailed: If \(H_a \) is two-tailed, then the p-value is two times the probability the sample data produces a value equal to or greater than the absolute value of the observed test statistic.

So for one-sample proportions we have...

Left-Tailed

\(P(Z \le z^*)\)

Right-Tailed

\(P(Z \ge z^*)\)

Two-Tailed

\(2\) x \(P(Z \ge |z^*|)\)

Once we find the p-value, we compare the p-value to our preset significance level.

If our p-value is less than or equal to \(\alpha \), then there is enough evidence to reject the null hypothesis.
If our p-value is greater than \(\alpha \), there is not enough evidence to reject the null hypothesis.

Caution! One should be aware that \(\alpha \) is also called level of significance. This makes for a confusion in terminology. \(\alpha \) is the preset level of significance whereas the p-value is the observed level of significance. The p-value, in fact, is a summary statistic which translates the observed test statistic's value to a probability which is easy to interpret.

Important note: We can summarize the data by reporting the p-value and let the users decide to reject \(H_0 \) or not to reject \(H_0 \) for their subjectively chosen \(\alpha\) values.

This video will further explain the meaning of the p-value.