6b.3 - Further Considerations for Hypothesis Testing

In this section, we include a little more discussion about some of the issues with hypothesis tests and items to be concious about.

Committing an Error

Every time we make a decision and come to a conclusion, we must keep in mind that our decision is based on probability. Therefore, it is possible that we made a mistake.

Consider the example of the previous Lesson on whether the majority of Penn State students are from Pennsylvania. In that example, we took a random sample of 500 Penn State students and found that 278 are from Pennsylvania. We rejected the null hypothesis, at a significance level of 5% with a p-value of 0.006.

The significance level of 5% means that we have a 5% chance of committing a Type I error. That is, we have 5% chance that we rejected a true null hypothesis.

If we failed to reject a null hypothesis, then we could have committed a Type II error. This means that we could have failed to reject a false null hypothesis.

How Important are the Conditions of a Test?

In our six steps in hypothesis testing, one of them is to verify the conditions. If the conditions are not satisfied, we can still calculate the test statistic and find the rejection region (or p-value). We cannot, however, make a decision or state a conclusion. The conclusion is based on probability theory.

If the conditions are not satisfied, there are other methods to help us make a conclusion. The conclusion, however, may be based on other parameters, such as the median. There are other tests (some are discussed in later lessons) that can be used.

Statistical and Practical Significances

Our decision in the emergency room waiting times example was to reject the null hypothesis and conclude that the average wait time exceeds 10 minutes. However, our sample mean of 11 minutes wasn't too far off from 10. So what do you think of our conclusion? Yes, statistically there was a difference at the 5% level of significance, but are we "impressed" with the results? That is, do you think 11 minutes is really that much different from 10 minutes?

Since we are sampling data we have to expect some error in our results therefore even if the true wait time was 10 minutes it would be extremely unlikely for our sample data to have a mean of exactly 10 minutes. This is the difference between statistical significance and practical significance. The former is the result produced by the sample data while the latter is the practical application of those results.

Statistical significance is concerned with whether an observed effect is due to chance and practical significance means that the observed effect is large enough to be useful in the real world.

Critics of hypothesis-testing procedures have observed that a population mean is rarely exactly equal to the value in the null hypothesis and hence, by obtaining a large enough sample, virtually any null hypothesis can be rejected. Thus, it is important to distinguish between statistical significance and practical significance.

The Relationship Between Power, \(\beta\), and \(\alpha\)

Recall that \(\alpha \) is the probability of committing a Type I error. It is the value that is preset by the researcher. Therefore, the researcher has control over the probability of this type of error. But what about \(\beta \), the probability of a Type II error? How much control do we have over the probability of committing this error? Similarly, we want power, the probability we correctly reject a false null hypothesis, to be high (close to 1). Is there anything we can do to have a high power?

The relationship between power and \(\beta \) is an inverse relationship, namely...

Power \( =1-\beta \)

If we increase power, then we decrease \(\beta \). But how do increase power? One way to increase the power is to increase the sample size.

Relationship between \(\alpha\) and \(\beta\):

If the sample size is fixed, then decreasing \(\alpha \) will increase \(\beta \), and therefore decrease power. If one wants both \(\alpha \) and \(\beta \) to decrease, then one has to increase the sample size.

It is possible, using software, to find the sample size required for set values of \(\alpha \) and power. Also using software, it is possible to determine the value of power. We do not go into details on how to do this but you are welcome to explore on your own.

Gathering data is like tasting fine wine—you need the right amount. With wine, too small a sip keeps you from accurately assessing a subtle bouquet, but too large a sip overwhelms the palate.

We can’t tell you how big a sip to take at a wine-tasting event, but when it comes to collecting data, software tools can tell you how much data you need to be sure about your results.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility