9.1 - One-way ANOVA Test

From regression, we learned that we can determine the impact a predictor variable has on a response. We draw these conclusions based on a hypothesis test for the slope. A significant slope, which is a non-zero slope, indicates a significant relationship between the predictor and response. However, in Moriah's example, it is hard to imagine a "slope" for a categorical predictor. As a reminder, categorical data has no numerical value so indicating a "one unit change" is meaningless. 

We need an alternative way of testing the relationship of a categorical predictor on a continuous response. This brings us to the Analysis of Variance (ANOVA) test. 

Now, let's pause a moment and get back to a very small statement from the regression notes. It was previously mentioned that regression and ANOVA are actually both linear models. The traditional difference was the type of predictor variable. We see in this unit, the categorical predictor, in the last unit continuous predictor. However, with our software, we can run a "regression" and tell Minitab that the variable is categorical and everything is fine. We can also run an ANOVA and tell Minitab the predictor is a "covariate" which is the same thing as telling Minitab we have a continuous variable. We point this out because as you learn more about ANOVA you will begin to see the similarities between the two techniques, and for good reason, they are both linear models. 

So, let's start with our null and alternative hypotheses for the ANOVA!

Hypotheses Section

The null

Recall that for a test for two independent means, the null hypothesis was \(\mu_1=\mu_2\). In one-way ANOVA, we want to compare \(t\) population means, where \(t>2\). Therefore, the null hypothesis for analysis of variance for \(t\) population means is:

\(H_0\colon \mu_1=\mu_2=...\mu_t\)

In Moriah’s data, the null is that there is no difference among the mean test scores among all three groups of students.

The alternative

The alternative, however, cannot be set up similarly to the two-sample case. If we wanted to see if two population means are different, the alternative would be \(\mu_1\ne\mu_2\). With more than two groups, the research question is “Are some of the means different?." If we set up the alternative to be \(\mu_1\ne\mu_2\ne…\ne\mu_t\), then we would have a test to see if ALL the means are different. This is not what we want. We need to be careful about how we set up the alternative. The mathematical version of the alternative is...

\(H_a\colon \mu_i\ne\mu_j\text{ for some }i \text{ and }j \text{ where }i\ne j\)

This means that at least one of the pairs is not equal. The more common presentation of the alternative is:

\(H_a\colon \text{ at least one mean is different}\) or \(H_a\colon \text{ not all the means are equal}\)

Moriah’s study is asking if at least one of the levels of food insecurity groups has a mean that is different from the others.

Test Statistic Section

Recall that when we compare the means of two populations for independent samples, we use a 2-sample t-test with pooled variance when the population variances can be assumed equal.

Test Statistic for One-Way ANOVA

For more than two populations, the test statistic, \(F\), is the ratio of between group sample variance and the within-group-sample variance. That is,

\(F=\dfrac{\text{between group variance}}{\text{within group variance}}\)

Under the null hypothesis (and with certain assumptions), both quantities estimate the variance of the random error, and thus the ratio should be close to 1. If the ratio is large, then we have evidence against the null, and hence, we would reject the null hypothesis.

In the next section, we present the assumptions for this test. In the following section, we present how to find the between group variance, the within group variance, and the F-statistic in the ANOVA table.