5.1 - Hypothesis Testing Overview

 Jin asked Carlos if he had taken statistics, Carlos said he had but it was a long time ago and he did not remember a lot of it. Jin told Carlos understanding hypothesis testing would help him understand what the judge just said. In most research, a researcher has a “research hypothesis”, that is, what the research THINKS is going to occur because of some kind of intervention or treatment. In the courtroom the prosecutor is the researcher, thinking the person on trial is guilty. This would be the research hypothesis; guilty. However, as most of us know, the U.S. legal system operates that a person is innocent until PROVEN guilty. In other words, we have to believe innocence until there is enough evidence to change our mind that the person on trial is actually not innocent. In hypothesis testing, we refer to the presumption of innocence as the NULL HYPOTHESIS. So while the prosecutor has a research hypothesis, it must be shown that the presumption of innocence can be rejected.

Like the judge in the TV show, if we have enough evidence to conclude that the null is not true, we can reject the null. Jin explained that if the judge had enough evidence to conclude the person on trial was not innocent she would have. The judge specifically stated that she did not have enough evidence to reject innocence (the null hypothesis).

When the judge acquits a defendant, as on the T.V. show, this does not mean that the judge accepts the defendant’s claim of innocence. It only says that innocence is plausible because guilt has not been established beyond a reasonable doubt.

On the other hand, if the judge returns a guilty verdict she has concluded innocence (null) is not plausible given the evidence presented, therefore she rejects the statute of the null, innocence and concludes the alternative hypothesis- guilty.

Let’s take a closer look at how this works.

Making a Decision Section

Taking a sample of 500 Penn State students, we asked them if they like cold weather, we observe a sample proportion of 0.556, since these students go to school in Pennsylvania it might generally be thought the true proportion of students who like cold weather is 0.5, in other words the NULL hypothesis is that the true population proportion equal to 0.5 ,

In order to “test” what is generally thought about these students (half of them like cold weather) we have to ask about the relationship of the data we have (from our sample) relative to the hypothesized null value. In other words, is our observed sample proportion far enough away from the 0.5 to suggest that there is evidence against the null? Translating this to statistical terms, we can think about the “how far” questions in terms of standard deviations. How many standard deviations apart would we consider to be “meaningfully different”?

What if instead of a cutoff standard deviation, we found a probability? With a null hypothesis of equal to 0.5, the alternative hypothesis is not equal to 0.50. To test this, we convert the distance between the observed value and the null value into a standardized statistic. We have worked with standardized scores when working with z scores. We also learned about the empirical rule. Combining these two concepts, we can begin to make decisions about “how far” the observed value and null hypothesis need to be to be “meaningfully different”.

To do this we calculate a Z statistic, which is a standardized score of the difference.

z* Test Statistics for a Single Proportion

\(z^{*}=\dfrac{\hat{p}-p_{0}}{\sqrt{\frac{p_{0}\left(1-p_{0}\right)}{n}}}\)

We can look at the results of calculating a z test (which we will do using software). Large test statistics indicate a large difference between the observed value and the null, contributing to greater evidence of a significant difference, thus casting doubt that the true population proportion is the null value.

Accompanying the magnitude of the test statistic, our software also yields a “probability”. Returning to the values of the empirical rule we know the percentiles under a standard normal curve. We can apply these to determine the probability (which is really a percentile) of getting an observed score IF the null hypothesis is indeed true (or the mean of the distribution). In this class, we will not be calculating these by hand, but we do need to understand what the “p-values'' in the output mean. In our example, after calculating a z statistic, we determine that if the true proportion is 0.5, the probability we would get a sample proportion of 0.556 is 0.0061. This is a very small probability as measure against the standard defining “small” as a probability less than .05. In this case, we would reject the null hypothesis as a probable value for the population based on the evidence from our sample.

While p values are a standard in most statistics courses and textbook there have been recent conversations about the use of p values. 

 American Statistical Association Releases Statement on Statistical Significance and P-Values

The use of p-values is a common practice in statistical inference but also not without its controversy. In March of 2016, the American Statistical Association released a statement regarding p-values and their use in statistical studies and decision making.

You can review the full article: ASA Statement on p-Values: Context, Process and Purpose

P-Values Section

Before we proceed any further we need to step away from the jargon and understand exactly what the heck a p value is. Simply a p value is the probability of getting the observed sample statistic, given the null hypothesis is true. In our example, IF the true proportion of Penn State students who like the cold IS really .5 (as we state in the null hypothesis), what is the probability that we would get an observed sample statistic of .556?

When the probability is small we have one of two options. We can either conclude there is something wrong with our sample (however, if we followed good sampling techniques as discussed early in the notes then this is not likely) OR we can conclude that the null is probably not the true population value. 

To summarize the application of the p value:

  • If our p-value is less than or equal to \(\alpha \), then there is enough evidence to reject the null hypothesis (in most cases the alpha is going to be 0.05).
  • If our p-value is greater than \(\alpha \), there is not enough evidence to reject the null hypothesis.
Caution!

One should be aware that \(\alpha \) is also called level of significance. This makes for a confusion in terminology. \(\alpha \) is the preset level of significance whereas the p-value is the observed level of significance. The p-value, in fact, is a summary statistic which translates the observed test statistic's value to a probability which is easy to interpret.

Important note:

We can summarize the data by reporting the p-value and let the users decide to reject \(H_0 \) or not to reject \(H_0 \) for their subjectively chosen \(\alpha\) values.