5: Hypothesis Testing

Overview

Case-Study: Guilty

Jin and Carlos watching a recent T.V. show, and the episode concluded with a courtroom scene. In the scene, the judge handed down her conclusion. She said “I do not have enough evidence to conclude you are not guilty, therefore I can NOT conclude you are guilty” Carlos turned to Jin and said, “Isn’t this the same thing as being innocent?“. Let’s see if we can use the logic of hypothesis testing to help Jin respond to Carlos.

In lesson 4, we learned about constructing confidence intervals to estimate actual population values. In this unit, we extend this idea of estimation to testing specific (hypothesized) values. Much of our day to day lives involves hypothesizing about actual values. Does the average number of hours worked a week really equal 40? Is the average fish caught on a weekend fishing trip really equal 4 feet (Ok, that is an attempt at a fish story)? These are examples of hypothesis testing. In a courtroom in the U.S., the hypothesis test is about being innocent (remember innocent until proven guilty).

It is important to understand that confidence intervals and hypothesis tests are similar, but used for different purposes. Confidence intervals give us an estimate of a population value when we typically do not know what the population value is. On the other hand, hypothesis tests require us to start with a guess (hypothesis) about what we think the value is. For example, we may think the difference between groups is zero and we want to see our evidence supports this guess. However, in practice, most people put more emphasis on confidence intervals as opposed to hypothesis testing, because confidence intervals estimate of specific values, but hypothesis testing is still an integral part of what we do in research.

So what is hypothesis testing and how can it help Jin explain the judge’s ruling?

Objectives

Upon completion of this lesson, you should be able to:

Compute a hypothesis test for two groups
Correctly identify elements of Z and T test
Apply the appropriate test for quantitative versus categorical data
Correctly interpret the results of a hypothesis test (including p values)
Identify the similar elements between confidence intervals and hypothesis tests

5.1 - Hypothesis Testing Overview

Jin asked Carlos if he had taken statistics, Carlos said he had but it was a long time ago and he did not remember a lot of it. Jin told Carlos understanding hypothesis testing would help him understand what the judge just said. In most research, a researcher has a “research hypothesis”, that is, what the research THINKS is going to occur because of some kind of intervention or treatment. In the courtroom the prosecutor is the researcher, thinking the person on trial is guilty. This would be the research hypothesis; guilty. However, as most of us know, the U.S. legal system operates that a person is innocent until PROVEN guilty. In other words, we have to believe innocence until there is enough evidence to change our mind that the person on trial is actually not innocent. In hypothesis testing, we refer to the presumption of innocence as the NULL HYPOTHESIS. So while the prosecutor has a research hypothesis, it must be shown that the presumption of innocence can be rejected.

Like the judge in the TV show, if we have enough evidence to conclude that the null is not true, we can reject the null. Jin explained that if the judge had enough evidence to conclude the person on trial was not innocent she would have. The judge specifically stated that she did not have enough evidence to reject innocence (the null hypothesis).

When the judge acquits a defendant, as on the T.V. show, this does not mean that the judge accepts the defendant’s claim of innocence. It only says that innocence is plausible because guilt has not been established beyond a reasonable doubt.

On the other hand, if the judge returns a guilty verdict she has concluded innocence (null) is not plausible given the evidence presented, therefore she rejects the statute of the null, innocence and concludes the alternative hypothesis- guilty.

Let’s take a closer look at how this works.

Making a Decision

Taking a sample of 500 Penn State students, we asked them if they like cold weather, we observe a sample proportion of 0.556, since these students go to school in Pennsylvania it might generally be thought the true proportion of students who like cold weather is 0.5, in other words the NULL hypothesis is that the true population proportion equal to 0.5 ,

In order to “test” what is generally thought about these students (half of them like cold weather) we have to ask about the relationship of the data we have (from our sample) relative to the hypothesized null value. In other words, is our observed sample proportion far enough away from the 0.5 to suggest that there is evidence against the null? Translating this to statistical terms, we can think about the “how far” questions in terms of standard deviations. How many standard deviations apart would we consider to be “meaningfully different”?

What if instead of a cutoff standard deviation, we found a probability? With a null hypothesis of equal to 0.5, the alternative hypothesis is not equal to 0.50. To test this, we convert the distance between the observed value and the null value into a standardized statistic. We have worked with standardized scores when working with z scores. We also learned about the empirical rule. Combining these two concepts, we can begin to make decisions about “how far” the observed value and null hypothesis need to be to be “meaningfully different”.

To do this we calculate a Z statistic, which is a standardized score of the difference.

z* Test Statistics for a Single Proportion

\(z^{*}=\dfrac{\hat{p}-p_{0}}{\sqrt{\frac{p_{0}\left(1-p_{0}\right)}{n}}}\)

We can look at the results of calculating a z test (which we will do using software). Large test statistics indicate a large difference between the observed value and the null, contributing to greater evidence of a significant difference, thus casting doubt that the true population proportion is the null value.

Accompanying the magnitude of the test statistic, our software also yields a “probability”. Returning to the values of the empirical rule we know the percentiles under a standard normal curve. We can apply these to determine the probability (which is really a percentile) of getting an observed score IF the null hypothesis is indeed true (or the mean of the distribution). In this class, we will not be calculating these by hand, but we do need to understand what the “p-values'' in the output mean. In our example, after calculating a z statistic, we determine that if the true proportion is 0.5, the probability we would get a sample proportion of 0.556 is 0.0061. This is a very small probability as measure against the standard defining “small” as a probability less than .05. In this case, we would reject the null hypothesis as a probable value for the population based on the evidence from our sample.

While p values are a standard in most statistics courses and textbook there have been recent conversations about the use of p values.

American Statistical Association Releases Statement on Statistical Significance and P-Values

The use of p-values is a common practice in statistical inference but also not without its controversy. In March of 2016, the American Statistical Association released a statement regarding p-values and their use in statistical studies and decision making.

You can review the full article: ASA Statement on p-Values: Context, Process and Purpose

P-Values

Before we proceed any further we need to step away from the jargon and understand exactly what the heck a p value is. Simply a p value is the probability of getting the observed sample statistic, given the null hypothesis is true. In our example, IF the true proportion of Penn State students who like the cold IS really .5 (as we state in the null hypothesis), what is the probability that we would get an observed sample statistic of .556?

When the probability is small we have one of two options. We can either conclude there is something wrong with our sample (however, if we followed good sampling techniques as discussed early in the notes then this is not likely) OR we can conclude that the null is probably not the true population value.

To summarize the application of the p value:

If our p-value is less than or equal to \(\alpha \), then there is enough evidence to reject the null hypothesis (in most cases the alpha is going to be 0.05).
If our p-value is greater than \(\alpha \), there is not enough evidence to reject the null hypothesis.

Caution!

One should be aware that \(\alpha \) is also called level of significance. This makes for a confusion in terminology. \(\alpha \) is the preset level of significance whereas the p-value is the observed level of significance. The p-value, in fact, is a summary statistic which translates the observed test statistic's value to a probability which is easy to interpret.

Important note:

We can summarize the data by reporting the p-value and let the users decide to reject \(H_0 \) or not to reject \(H_0 \) for their subjectively chosen \(\alpha\) values.

5.2 - Hypothesis Testing for One Sample Proportion

Recall our “test” about whether Penn State students like cold weather. we have to ask about the relationship of the data we have (from our sample) relative to the hypothesized null value. In other words, is our observed sample proportion far enough away from the 0.5 to suggest that there is evidence against the null?

We can use what we know about the sampling distribution of sample proportions to help find our evidence!

Hypothesis Testing for One Sample Proportion

Recall that under certain conditions, the sampling distribution of the sample proportion, \(\hat{p} \), is approximately normal with mean, \(p \), standard error \(\sqrt{\dfrac{p(1-p)}{n}}\), and estimated standard error \(\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\).

Null:

\(H_0\colon p=p_0\)

Conditions:

\(np_0 \ge 5\) and \(n(1-p_0)\ge5\)

Test Statistic:

z* Test Statistics for a Single Proportion

\(z^{*}=\dfrac{\hat{p}-p_{0}}{\sqrt{\dfrac{p_{0}\left(1-p_{0}\right)}{n}}}\)

5.3 - Hypothesis Testing for One-Sample Mean

In the previous section, we learned how to perform a hypothesis test for one proportion. The concepts of hypothesis testing remain constant for any hypothesis test. In these next few sections, we will present the hypothesis test for one mean. We start with our knowledge of the sampling distribution of the sample mean.

Hypothesis Test for One-Sample Mean

Recall that under certain conditions, the sampling distribution of the sample mean, \(\bar{x} \), is approximately normal with mean, \(\mu \), standard error \(\dfrac{\sigma}{\sqrt{n}} \), and estimated standard error \(\dfrac{s}{\sqrt{n}} \).

Null:

\(H_0\colon \mu=\mu_0\)

Conditions:

The distribution of the population is Normal
The sample size is large \( n>30 \).

Test Statistic:

If at least one of conditions are satisfied, then...

\( t=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}} \)

will follow a t-distribution with \(n-1 \) degrees of freedom.

Notice when working with continuous data we are going to use a t statistic as opposed to the z statistic. This is due to the fact that the sample size impacts the sampling distribution and needs to be taken into account. We do this by recognizing “degrees of freedom”. We will not go into too much detail about degrees of freedom in this course.

Let’s look at an example.

Example 5-1

The mean length of the lumber is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a sample standard deviation of 1.2 feet. What will she conclude? Is 8.3 very different from 8.5?

This depends on the standard deviation of \(\bar{x} \) .

\begin{align} t^*&=\dfrac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}\\&=\dfrac{8.3-8.5}{\frac{1.2}{\sqrt{61}}}\\&=-1.3 \end{align}

Thus, we are asking if \(-1.3\) is very far away from zero, since that corresponds to the case when \(\bar{x}\) is equal to \(\mu_0 \). If it is far away, then it is unlikely that the null hypothesis is true and one rejects it. Otherwise, one cannot reject the null hypothesis.

5.3.1- Steps in Conducting a Hypothesis Test for \(\mu\)

Step 1: Set up the hypotheses and check conditions

One Mean t-test Hypotheses

\( H_0\colon \mu=\mu_0 \)

\( H_a\colon \mu\ne \mu_0 \)

Conditions: The data comes from an approximately normal distribution or the sample size is at least 30
Step 2: Decide on the significance level, \(\alpha \)
Typically, 5%. If \(\alpha\) is not specified, use 5%
Step 3: Calculate the test statistic
One Mean t-test: \( t^*=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}} \)
Step 4: Compute the appropriate p-value based on our alternative hypothesis
Typically we will let Minitab handle this for us. But if you are really interested, you can look p values up in probability tables found in the appendix of your textbook!
Step 5: Make a decision about the null hypothesis
If the p-value is less than the significance level, \(\alpha\), then reject \(H_0\) (and conclude \(H_a \)). If it is greater than the significance level, then do not reject \(H_0 \).
Step 6
State an overall conclusion.

Minitab^®

Conduct a One-Sample Mean t-Test

Note that these steps are very similar to those for one-mean confidence interval. The differences occur in steps 4 through 8.

To conduct the one sample mean t-test in Minitab...

Choose Stat > Basic Stat > 1 Sample t.
In the drop-down box use "One or more samples, each in a column" if you have the raw data, otherwise select "Summarized data" if you only have the sample statistics.
If using the raw data, enter the column of interest into the blank variable window below the drop down selection. If using summarized data, enter the sample size, sample mean, and sample standard deviation in their respective fields.
Choose the check box for "Perform hypothesis test" and enter the null hypothesis value.
Choose Options.
Enter the confidence level associated with alpha (e.g. 95% for alpha of 5%).
From the drop down list for "Alternative hypothesis" select the correct alternative.
Click OK and OK.

5.4 - Further Considerations for Hypothesis Testing

In this section, we include a little more discussion about some of the issues with hypothesis tests and items to be conscious about.

5.4.1 - Errors

Committing an Error

Every time we make a decision and come to a conclusion, we must keep in mind that our decision is based on probability. Therefore, it is possible that we made a mistake.

Consider the example of the previous Lesson on whether the majority of Penn State students like the cold. In that example, we took a random sample of 500 Penn State students and found that 278 like the cold. We rejected the null hypothesis, at a significance level of 5% with a p-value of 0.006.

Type I Error

Rejecting \(H_0\) when \(H_0\) is really true, denoted by \(\alpha\) ("alpha") and commonly set at .05

\(\alpha=P(Type\;I\;error)\)

The significance level of 5% means that we have a 5% chance of committing a Type I error. That is, we have 5% chance that we rejected a true null hypothesis.

Type II Error

Failing to reject \(H_0\) when \(H_0\) is really false, denoted by \(\beta\) ("beta")

\(\beta=P(Type\;II\;error)\)

If we failed to reject a null hypothesis, then we could have committed a Type II error. This means that we could have failed to reject a false null hypothesis.

Decision	Reality
Decision	\(H_0\) is true	\(H_0\) is false	Probability Level
Reject \(H_0\), (conclude \(H_a\))	Type I error	Correct decision	P is LESS than .05 that the null is true Small probabilities (less than .05) lead to rejecting the null
Fail to reject \(H_0\)	Correct decision	Type II error	P is GREATER than .05 that the null is true Large probability (greater than .05) lead to not rejecting the null

How Important are the Conditions of a Test?

In our six steps in hypothesis testing, one of them is to verify the conditions. If the conditions are not satisfied, we cannot, however, make a decision or state a conclusion. The conclusion is based on probability theory.

If the conditions are not satisfied, there are other methods to help us make a conclusion. The conclusion, however, may be based on other parameters, such as the median. There are other tests (called nonparametric) that can be used.

5.4.2 - Statistical and Practical Significance

Our decision in the Penn State example was to reject the null hypothesis and conclude that the proportion of Penn State students who like the cold was not 0.5. However, our sample proportion of 0.556 wasn't too far off from 0.5. What do you think of our conclusion? Yes, statistically there was a difference at the 5% level of significance, but are we "impressed" with the results? That is, do you think 0.556 is really that much different from 0.5?

Here we distinguish between statistical significance and practical significance. Statistical significance is concerned with whether an observed effect is due to chance and practical significance means that the observed effect is large enough to be useful in the real world.

5.4.3 - The Relationship Between Power, \(\beta\), and \(\alpha\)

Recall that \(\alpha \) is the probability of committing a Type I error. It is the value that is preset by the researcher. Therefore, the researcher has control over the probability of this type of error. But what about \(\beta \), the probability of a Type II error? How much control do we have over the probability of committing this error? Similarly, we want power, the probability we correctly reject a false null hypothesis, to be high (close to 1). Is there anything we can do to have a high power?

The relationship between power and \(\beta \) is an inverse relationship, namely...

Power: \(Power = 1-\beta\); \(\beta\) = probability of committing a Type II Error.

If we increase power, then we decrease \(\beta \). But how do we increase power? One way to increase power is to increase the sample size. Sample size calculations are included in your textbook but not covered in the course. Remember, it is possible to answer the question of “how many ___ do I have to study” by learning about sample size estimates.

The concepts, logic, and terminology of hypothesis testing can take some time to master. It is worth it! Hypothesis testing is a very powerful statistical tool.

Next, we will move onto situations where we compare more than one population parameter.

5.5 - Hypothesis Testing for Two-Sample Proportions

We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group.

These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing for two groups. If this starts to get a little confusion, just skim over it for a general understanding! Remember we can rely on the software to do the calculations for us, but it is good to have a basic understanding of the logic!

We will use the sampling distribution of \(\hat{p}_1-\hat{p}_2\) as we did for the confidence interval.

For a test for two proportions, we are interested in the difference between two groups. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be:

\(H_0\colon p_1-p_2=0\)

Another way to look at it is \(H_0\colon p_1=p_2\). This is worth stopping to think about. Remember, in hypothesis testing, we assume the null hypothesis is true. In this case, it means that \(p_1\) and \(p_2\) are equal. Under this assumption, then \(\hat{p}_1\) and \(\hat{p}_2\) are both estimating the same proportion. Think of this proportion as \(p^*\).

Therefore, the sampling distribution of both proportions, \(\hat{p}_1\) and \(\hat{p}_2\), will, under certain conditions, be approximately normal centered around \(p^*\), with standard error \(\sqrt{\dfrac{p^*(1-p^*)}{n_i}}\), for \(i=1, 2\).

We take this into account by finding an estimate for this \(p^*\) using the two-sample proportions. We can calculate an estimate of \(p^*\) using the following formula:

\(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\)

This value is the total number in the desired categories \((x_1+x_2)\) from both samples over the total number of sampling units in the combined sample \((n_1+n_2)\).

Putting everything together, if we assume \(p_1=p_2\), then the sampling distribution of \(\hat{p}_1-\hat{p}_2\) will be approximately normal with mean 0 and standard error of \(\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}\), under certain conditions.

Therefore,

\(z^*=\dfrac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...will follow a standard normal distribution.

Finally, we can develop our hypothesis test for \(p_1-p_2\).

Hypothesis Testing for Two-Sample Proportions

Null:

\(H_0\colon p_1-p_2=0\)

Conditions:

\(n_1\hat{p}_1\), \(n_1(1-\hat{p}_1)\), \(n_2\hat{p}_2\), and \(n_2(1-\hat{p}_2)\) are all greater than five

Test Statistic:

\(z^*=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...where \(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\).

The critical values, p-values, and decisions will all follow the same steps as those from a hypothesis test for a one-sample proportion.

5.6 - Comparing Two Population Means

Working with two group means requires a few extra steps. We need to know the relationship between the two groups. Are the groups independent or dependent? Another important step is to understand the variability of the two groups (our old friend variability comes into play again!). If the groups have unequal variance we may have to account for that in our hypothesis test.

Let's take a look at independence and dependence first.

Independent and Dependent Samples

Independent Sample: The samples from two populations are independent if the samples selected from one of the populations have no relationship with the samples selected from the other population.

Dependent Sample: The samples are dependent (also called paired data) if each measurement in one sample is matched or paired with a particular measurement in the other sample. Another way to consider this is how many measurements are taken off of each subject. If only one measurement, then independent; if two measurements, then paired. Exceptions are in familial situations such as in a study of spouses or twins. In such cases, the data is almost always treated as paired data.

These notes will first work through independent groups and then proceed to dependent groups.

5.6.1 - Inference for Independent Means

As with comparing two population proportions, when we compare two population means from independent populations, the interest is in the difference between the two means. In other words, if \(\mu_1\) is the population mean from population 1 and \(\mu_2\) is the population mean from population 2, then the difference is \(\mu_1-\mu_2\). If \(\mu_1-\mu_2=0\) then there is no difference between the two population parameters.

If each population is normal, then the sampling distribution of \(\bar{x}_i\) is normal with mean \(\mu_i\), standard error \(\dfrac{\sigma_i}{\sqrt{n_i}}\), and the estimated standard error \(\dfrac{s_i}{\sqrt{n_i}}\), for \(i=1, 2\).

Using the Central Limit Theorem, if the population is not normal, then with a large sample, the sampling distribution is approximately normal.

The theorem presented in this Lesson says that if either of the above are true, then \(\bar{x}_1-\bar{x}_2\) is approximately normal with mean \(\mu_1-\mu_2\), and standard error \(\sqrt{\dfrac{\sigma^2_1}{n_1}+\dfrac{\sigma^2_2}{n_2}}\).

That all sounds great, however, in most cases, \(\sigma_1\) and \(\sigma_2\) are unknown, and they have to be estimated. It seems natural to estimate \(\sigma_1\) by \(s_1\) and \(\sigma_2\) by \(s_2\). When the sample sizes are small, the estimates may not be that accurate and one may get a better estimate for the common standard deviation by pooling the data from both populations if the standard deviations for the two populations are not that different, however if the standard deviations are different, then we want to include that difference in our test.

Given this, there are two options for estimating the variances for the independent samples:

Using pooled variances
Using unpooled (or unequal) variances

When to use which? Well, first, the nice thing is that many software packages calculate the variances "behind the curtain" and will show you the most appropriate output. However, if you are NOT sure, you can always use the unpooled method. The consequence of using unpooled is that the test is more conservative making it marginally more difficult to reject the null. However, the consequence of using pooled variances is an incorrect model.

5.6.1.1 - Pooled Variances

Hypothesis Tests for \(\mu_1− \mu_2\): The Pooled t-test

Now let's consider the hypothesis test for the mean differences with pooled variances.

Null:

\(H_0\colon\mu_1-\mu_2=0\)

Conditions:

The assumptions/conditions are:

The populations are independent
The population variances are equal
Each population is either normal or the sample size is large

Test Statistic:

The test statistic is...

\(t^*=\dfrac{\bar{x}_1-\bar{x}_2-0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\)

And \(t^*\) follows a t-distribution with degrees of freedom equal to \(df=n_1+n_2-2\).

The p-value, critical value, and conclusion are found similar to what we have done before.

5.6.1.2 - Unpooled Variances

When the assumption of equal variances is not valid, we need to use separate, or unpooled, variances. The mathematics and theory are complicated for this case and we intentionally leave out the details.

Hypothesis Tests for \(\mu_1− \mu_2\): The Pooled t-test

Null:

\(H_0\colon\mu_1-\mu_2=0\)

Conditions:

We still have the following assumptions:

The populations are independent
Each population is either normal or the sample size is large

Test Statistic:

If the assumptions are satisfied, then

\(t^*=\dfrac{\bar{x}_1-\bar{x_2}-0}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}}\)

will have a t-distribution with degrees of freedom

\(df=\dfrac{(n_1-1)(n_2-1)}{(n_2-1)C^2+(1-C)^2(n_1-1)}\)

where \(C=\dfrac{\frac{s^2_1}{n_1}}{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}\).

Note! This calculation for the exact degrees of freedom is cumbersome and is typically done by software. An alternate, conservative option to using the exact degrees of freedom calculation can be made by choosing the smaller of \(n_1-1\) and \(n_2-1\).

\((1-\alpha)100\%\) Confidence Interval for \(\mu_1-\mu_2\) for Unpooled Variances: \(\bar{x}_1-\bar{x}_2\pm t_{\alpha/2} \sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}\)

Where \(t_{\alpha/2}\) comes from the t-distribution using the degrees of freedom above.

Minitab^®

Unpooled t-test

To perform a separate variance 2-sample, t-procedure use the same commands as for the pooled procedure EXCEPT we do NOT check box for 'Use Equal Variances.'

Choose Stat > Basic Statistics > 2-sample t
Select the Options box and enter the desired 'Confidence level,' 'Null hypothesis value' (again for our class this will be 0), and select the correct 'Alternative hypothesis' from the drop-down menu.
Choose OK.

For some examples, one can use both the pooled t-procedure and the separate variances (non-pooled) t-procedure and obtain results that are close to each other. However, when the sample standard deviations are very different from each other, and the sample sizes are different, the separate variances 2-sample t-procedure is more reliable.

5.6.2 - Inference for Paired Means

When we developed the inference for the independent samples, we depended on the statistical theory to help us. The theory, however, required the samples to be independent. What can we do when the two samples are not independent, i.e., the data is paired?

Consider an example where we are interested in a person’s weight before implementing a diet plan and after. Since the interest is focusing on the difference, it makes sense to “condense” these two measurements into one and consider the difference between the two measurements. For example, if instead of considering the two measures, we take the before diet weight and subtract the after diet weight. The difference makes sense too! It is the weight lost on the diet.

When we take the two measurements to make one measurement (i.e., the difference), we are now back to the one sample case! Now we can apply all we learned for the one sample mean to the difference (Cool!)

Hypothesis Test for the Difference of Paired Means, \(μ_d\)

In this section, we will develop the hypothesis test for the mean difference for paired samples. As we learned in the previous section, if we consider the difference rather than the two samples, then we are back in the one-sample mean scenario.

The possible null and alternative hypotheses are:

Null:: \(H_0\colon \mu_d=0\)

Conditions:

We still need to check the conditions and at least one of the following need to be satisfied:

The differences of the paired follow a normal distribution
The sample size is large, \(n>30\).

Test Statistics:

If at least one condition is satisfied then...

\(t^*=\dfrac{\bar{d}-0}{\frac{s_d}{\sqrt{n}}}\)

Will follow a t-distribution with \(n-1\) degrees of freedom.

The same process for the hypothesis test for one mean can be applied. The test for the mean difference may be referred to as the paired t-test or the test for paired means.

Minitab^®

Paired t-Test

You can use a paired t-test in Minitab to perform the test. Alternatively, you can perform a 1-sample t-test on difference = before and after diet plan.

Choose Stat > Basic Statistics > Paired t
Click Options to specify the confidence level for the interval and the alternative hypothesis you want to test. The default null hypothesis is 0.

Diet Plan

The Minitab output for paired T for before-after diet plan is as follows:

Answer

95% lower bound for mean difference: 0.0505

T-Test of mean difference = 0 (vs > 0): T-Value = 4.86 P-Value = 0.000

Using the p-value to draw a conclusion about our example:

p-value = \(0.000 < 0.05\)

Reject \(H_0\) and conclude that before diet weight is greater than after diet weight.

5.7 - Summary

Now that we have a good understanding, let’s turn back to our example of the judge’s decision. To review, the null hypothesis is that the person is innocent, the alternative hypothesis is guilty. The judge does not have enough evidence, in our statistical terms, the observations from the court proceedings do not convince the judge to be far enough “away” from a verdict of innocent to reject the presumption of innocence. Therefore she cannot reject the null hypothesis of being innocent, hence her verdict. Justice served?

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility

5: Hypothesis Testing

Overview

Case-Study: Guilty

Objectives

5.1 - Hypothesis Testing Overview

Making a Decision

American Statistical Association Releases Statement on Statistical Significance and P-Values

P-Values

5.2 - Hypothesis Testing for One Sample Proportion

Hypothesis Testing for One Sample Proportion

5.3 - Hypothesis Testing for One-Sample Mean

Hypothesis Test for One-Sample Mean

Example 5-1

5.3.1- Steps in Conducting a Hypothesis Test for \(\mu\)

Minitab®

Conduct a One-Sample Mean t-Test

5.4 - Further Considerations for Hypothesis Testing

5.4.1 - Errors

Committing an Error

How Important are the Conditions of a Test?

5.4.2 - Statistical and Practical Significance

5.4.3 - The Relationship Between Power, \(\beta\), and \(\alpha\)

5.5 - Hypothesis Testing for Two-Sample Proportions

5.6 - Comparing Two Population Means

Independent and Dependent Samples

5.6.1 - Inference for Independent Means

5.6.1.1 - Pooled Variances

Hypothesis Tests for \(\mu_1− \mu_2\): The Pooled t-test

5.6.1.2 - Unpooled Variances

Minitab®

Unpooled t-test

5.6.2 - Inference for Paired Means

Hypothesis Test for the Difference of Paired Means, \(μ_d\)

Minitab®

Paired t-Test

Diet Plan

5.7 - Summary

Minitab^®

Minitab^®

Minitab^®