9: ANOVA (General Linear Models Part II)

Overview

Case-Study: Test scores and socioeconomic status

Moriah is a community and economic development major. She is interested in studying the impact of people who suffer from hunger (food insecurity) in economically depressed urban areas on students’ test scores. Moriah has access to some local schools in an economically depressed urban area to collaborate with teachers to assess how often students go hungry. To make the study easier for teachers, she comes up with a 3 point rating scale of low food insecurity, medium insecurity, and high food insecurity, now she wants to see if the level of food insecurity has any relationship to test scores. Can you help Moriah get started on her study?

As is good practice, the first thing Moriah needs to do is correctly identify her data. Moriah’s test scores are measured as a quantitative variable, so we know she has a quantitative response variable. Unlike regression (where the predictor variable was continuous), from unit 8, Moriah’s predictor variable is a categorical one, low, medium, and high food insecurity. How do we tell if the level of insecurity will make a difference in the test scores?

First, let’s look at the descriptive statistics for Moriah’s data. We can see that the mean test scores for the three groups range from 70 for the high food insecurity group to a high of almost 90 for the low food insecurity group. It looks like there might be a trend but Moriah still needs to test to see if these differences among the groups are significant or not.

Descriptive Statistics: Critical Areas, Cost

Variable	N	Mean	SE Mean	StDev	Minimum	Q1	Median	Q3	Maximum
High Food Insecurity	30	70.181	0.168	0.923	67.782	69.692	70.162	70.848	72.704
Medium Food Insecurity	30	84.823	0.161	0.882	82.799	84.206	84.978	85.508	86.536
Low Food Insecurity	30	89.713	0.182	0.995	87.609	89.032	89.856	90.531	91.667

Objectives

Upon completion of this lesson, you should be able to:

Identify the components used calculating an F test
Interpret an F test, including the null and alternative hypothesis
Interpret a Post-hoc analysis
Differentiate between regression and ANOVA
State the assumptions for ANOVA

9.1 - One-way ANOVA Test

From regression, we learned that we can determine the impact a predictor variable has on a response. We draw these conclusions based on a hypothesis test for the slope. A significant slope, which is a non-zero slope, indicates a significant relationship between the predictor and response. However, in Moriah's example, it is hard to imagine a "slope" for a categorical predictor. As a reminder, categorical data has no numerical value so indicating a "one unit change" is meaningless.

We need an alternative way of testing the relationship of a categorical predictor on a continuous response. This brings us to the Analysis of Variance (ANOVA) test.

Now, let's pause a moment and get back to a very small statement from the regression notes. It was previously mentioned that regression and ANOVA are actually both linear models. The traditional difference was the type of predictor variable. We see in this unit, the categorical predictor, in the last unit continuous predictor. However, with our software, we can run a "regression" and tell Minitab that the variable is categorical and everything is fine. We can also run an ANOVA and tell Minitab the predictor is a "covariate" which is the same thing as telling Minitab we have a continuous variable. We point this out because as you learn more about ANOVA you will begin to see the similarities between the two techniques, and for good reason, they are both linear models.

So, let's start with our null and alternative hypotheses for the ANOVA!

Hypotheses

The null

Recall that for a test for two independent means, the null hypothesis was \(\mu_1=\mu_2\). In one-way ANOVA, we want to compare \(t\) population means, where \(t>2\). Therefore, the null hypothesis for analysis of variance for \(t\) population means is:

\(H_0\colon \mu_1=\mu_2=...\mu_t\)

In Moriah’s data, the null is that there is no difference among the mean test scores among all three groups of students.

The alternative

The alternative, however, cannot be set up similarly to the two-sample case. If we wanted to see if two population means are different, the alternative would be \(\mu_1\ne\mu_2\). With more than two groups, the research question is “Are some of the means different?." If we set up the alternative to be \(\mu_1\ne\mu_2\ne…\ne\mu_t\), then we would have a test to see if ALL the means are different. This is not what we want. We need to be careful about how we set up the alternative. The mathematical version of the alternative is...

\(H_a\colon \mu_i\ne\mu_j\text{ for some }i \text{ and }j \text{ where }i\ne j\)

This means that at least one of the pairs is not equal. The more common presentation of the alternative is:

\(H_a\colon \text{ at least one mean is different}\) or \(H_a\colon \text{ not all the means are equal}\)

Moriah’s study is asking if at least one of the levels of food insecurity groups has a mean that is different from the others.

Test Statistic

Recall that when we compare the means of two populations for independent samples, we use a 2-sample t-test with pooled variance when the population variances can be assumed equal.

Test Statistic for One-Way ANOVA

For more than two populations, the test statistic, \(F\), is the ratio of between group sample variance and the within-group-sample variance. That is,

\(F=\dfrac{\text{between group variance}}{\text{within group variance}}\)

Under the null hypothesis (and with certain assumptions), both quantities estimate the variance of the random error, and thus the ratio should be close to 1. If the ratio is large, then we have evidence against the null, and hence, we would reject the null hypothesis.

In the next section, we present the assumptions for this test. In the following section, we present how to find the between group variance, the within group variance, and the F-statistic in the ANOVA table.

9.2 - Assumptions for One-Way ANOVA Test

If you recall, there were four assumptions for regression (LINE), in ANOVA there are three primary assumptions (NOTE the missing assumption is linearity which actually does not make much sense when working with categorical predictors!):

The responses for each factor level have a normal population distribution.
These distributions have the same variance.
The data are independent.

Note! Violations to the first two that are not extreme can be considered not serious. The sampling distribution of the test statistic is fairly robust, especially as sample size increases and more so if the sample sizes for all factor levels are equal. If you conduct an ANOVA test, you should always try to keep the same sample sizes for each factor level.

With Moriah’s data, we can examine the residual plots to determine if these assumptions are met.

As was the case with regression, normality is established by seeing a bell curve in the histogram. Equal variance is reflected in the “Versus Fits” plot, with the spread of the blue dots about the same across all three levels of the fitted values. Finally, independence is determined due to the nature of the study not being constructed of dependent sampling units.

9.3 - The ANOVA Table

In this section, we present the Analysis of Variance Table. Recall that we want to examine the between group variation and the within group variation by using an F Test

\(F=\dfrac{\text{between group variance}}{\text{within group variance}}\)

However, to understand what an F test is doing we need to understand what we mean by "between group variance" and "within group variance".

These terms should remind you of the early course material on variability. We introduced variability relative to each observation's deviance from the mean. This idea is true when looking at ANOVA. Between group variability is the deviance of each GROUP MEAN from the overall mean. Within group variability is the observation's deviance from that observation's group mean. As with calculating variance and standard deviation, we work with these deviance scores as squared terms.

In ANOVA the between and within group variability is presented as "sums of squares" in the ANOVA table:

Sum of Squares for Treatment or the Between Group Sum of Squares: \(\text{SST}=\sum_{i=1}^t n_i(\bar{y}_{i.}-\bar{y}_{..})^2\)
Sum of Squares for Error or the Within Group Sum of Squares: \(\text{SSE}=\sum_{i, j} (y_{ij}-\bar{y}_{i.})^2\)
Total Sum of Squares: \(\text{TSS}=\sum_{i,j} (y_{ij}-\bar{y}_{..})^2\)

It can be derived that \(\text{TSS } = \text{ SST } + \text{ SSE}\).

We can set up the ANOVA table to help us find the F-statistic. Hover over the light bulb to get more information on that item.

The ANOVA Table
Source	DF	Adj SS	Adj MS	F-Value	P-Value
Treatment	\(t-1\)	\(\text{SST}\)	\(\text{MST}=\dfrac{\text{SST}}{t-1}\)	\(\dfrac{\text{MST}}{\text{MSE}}\)
Error	\(n_T-t\)	\(\text{SSE}\)	\(\text{MSE}=\dfrac{\text{SSE}}{n_T-t}\)
Total	\(n_T-1\)	\(\text{TSS}\)

Moriah’s data yields the following ANOVA table.

Analysis of Variance
Source	DF	Adj SS	Adj MS	F-Value	P-Value
Factor	2	6197.85	3098.93	3549.44	0.000
Error	87	75.96	0.87
Total	89	6273.81

Note for Moriah’s data that the F is very very large. This large F is the ratio of the Adj MS for the “factor” (which is the food insecurity or BETWEEN GROUP variance) and the “error” which is the WITHIN GROUP variance, or 3098.93/0.87.

The p-value is found using the F-statistic and the F-distribution. We will not ask you to find the p-value for this test. You will only need to know how to interpret it. If the p-value is less than our predetermined significance level, we will reject the null hypothesis that all the means are equal. However, in a case like Moriah’s the p value is less than our significance level of .05, therefore Moriah can reject the null and conclude at least one of the means is different.

9.4 - Multiple Comparisons

Since Moriah rejected the null hypothesis, she concludes that not all the means are equal: that is, at least one mean is different from the other means. The ANOVA test itself provides only statistical evidence of a difference, but not any statistical evidence as to which mean or means are statistically different.

So Moriah wants to follow up to determine which levels of hunger differ in test scores. To complete this analysis we use a method called multiple comparisons.

Multiple comparisons conduct an analysis of all possible pairwise means. For example, with three levels of food insecurity, low food insecurity, medium food insecurity, and high food insecurity the multiple comparison methods would compare the three possible pairwise comparisons:

Low Food Insecurity to Medium Food Insecurity
Low Food Insecurity to High Food Insecurity
Medium Food Insecurity to High Food Insecurity

These are essentially tests of two means similar to what we learned previously in our lesson for comparing two means. However, the methods here use an adjustment to account for the number of comparisons taking place. Minitab provides three adjustment choices. We will use the Tukey adjustment which is an adjustment on the t-multiplier based on the number of comparisons.

Note! We don’t go into the theory behind the Tukey method. Just note that we only use a multiple comparison technique in ANOVA when we have a significant result.

Minitab^®

Using Minitab to Perform One-Way ANOVA

If the data entered in Minitab are in different columns, then in Minitab we use:

Stat > ANOVA > One-Way
Select the format structure of the data in the worksheet.
- If the responses are in one column and the factors are in their own column, then select the drop down of 'Response data are in one column for all factor levels.'
- If the responses are in their own column for each factor level, then select 'Response data are in a separate column for each factor level.'
Next, in case we have a significant ANOVA result, and we want to conduct a multiple comparison analysis, we preemptively click 'Comparisons', the box for Tukey, and verify that the boxes for 'Interval plot for differences of means' and 'Grouping Information' are also checked.
Click OK and OK again.

So now, we can help Winton answer her question about test scores and food insecurity.

9.5 - ANOVA and Regression

These models can get a lot more complicated, but in the end, they all revert back to a linear model, just as a regression does. The first thing to notice is the assumptions for regression and ANOVA are very similar. Other than linearity they are exactly the same. The fact that linearity is not included in the assumptions for ANOVA makes sense if we recall that in the regression example we used a quantitative predictor variable, and in Moriah’s example we use a categorical variable. Recalling how we make a scatterplot, it is very reasonable to NOT be able (or needing) to look at linearity as an assumption for a relationship between a categorical predictor variable and a quantitative response variable.

The more meaningful parallel is the mechanics of the two statistical techniques. If we created a dummy variable for the food insecurity levels in Moriah’s study, we would then have the model:

Test score = Low Food Insecurity + Medium Food Insecurity + High Food Insecurity

This results in each level of food insecurity having a different slope. Our test is simply one of testing the differences of each slope. Viola a regression! If this is too confusing, it is okay. Just remember that the label of “regression” and “ANOVA” are really for convenience sake. Before powerful computing a linear model with quantitative variables was a regression and a linear model with categorical variables was an ANOVA. Now, we have so many kinds of models with so many processing options the distinction is very trivial!

9.6 - Lesson Summary

By recognizing the categorical predictor variable, Moriah can apply an ANOVA model to her work and identify that there are significant differences in food insecurity levels. She can also follow this significant finding to determine exactly where the significant differences are present. While the nuances between regression and ANOVA may be perplexing, understanding that both are linear models and can apply to both Bob's continuous and Moriah's categorical predictor types is a great piece of statistical knowledge!

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility