We could take a top-down approach by first presenting the theory of analysis of variance and then following it up with an example. We're not going to do it that way though. We're going to take a bottom-up approach, in which we first develop the idea behind the analysis of variance on this page, and then present the results on the next page. Only after we've completed those two steps will we take a step back and look at the theory behind analysis of variance. That said, let's start with our first example of the lesson.
Example 13-1 Section
A researcher for an automobile safety institute was interested in determining whether or not the distance that it takes to stop a car going 60 miles per hour depends on the brand of the tire. The researcher measured the stopping distance (in feet) of ten randomly selected cars for each of five different brands. So that he and his assistants would remain blinded, the researcher arbitrarily labeled the brands of the tires as Brand1, Brand2, Brand3, Brand4, and Brand5. Here are the data resulting from his experiment:
Brand1 | Brand2 | Brand3 | Brand4 | Brand5 |
---|---|---|---|---|
194 | 189 | 185 | 183 | 195 |
184 | 204 | 183 | 193 | 197 |
189 | 190 | 186 | 184 | 194 |
189 | 190 | 183 | 186 | 202 |
188 | 189 | 179 | 194 | 200 |
186 | 207 | 191 | 199 | 211 |
195 | 203 | 188 | 196 | 203 |
186 | 193 | 196 | 188 | 206 |
183 | 181 | 189 | 193 | 202 |
188 | 206 | 194 | 196 | 195 |
Do the data provide enough evidence to conclude that at least one of the brands is different from the others with respect to stopping distance?
Answer
The first thing we might want to do is to create some sort of summary plot of the data. Here is a box plot of the data:
Hmmm. It appears that the box plots for Brand1 and Brand5 have very little, if any, overlap at all. The same can be said for Brand3 and Brand5. Here are some summary statistics of the data:
Brand | N | MEAN | SD |
---|---|---|---|
1 | 10 | 188.20 | 3.88 |
2 | 10 | 195.20 | 9.02 |
3 | 10 | 187.40 | 5.27 |
4 | 10 | 191.20 | 5.55 |
5 | 10 | 200.50 | 5.44 |
It appears that the sample means differ quite a bit. For example, the average stopping distance of Brand3 is 187.4 feet (with a standard deviation of 5.27 feet), while the average stopping distance of Brand5 is 200.5 feet (with a standard deviation of 5.44 feet). A difference of 13 feet could mean the difference between getting into an accident or not. But, of course, we can't draw conclusions about the performance of the brands based on one sample. After all, a different random sample of cars could yield different results. Instead, we need to use the sample means to try to draw conclusions about the population means.
More specifically, the researcher needs to test the null hypothesis that the group population means are all the same against the alternative that at least one group population mean differs from the others. That is, the researcher needs to test this null hypothesis:
\(H_0 \colon \mu_1=\mu_2=\mu_3=\mu_4=\mu_5\)
against this alternative hypothesis:
\(H_A \colon \) at least one of the \(\mu_i\) differs from the others
In this lesson, we are going to learn how to use a method called analysis of variance to answer the researcher's question. Jumping right to the punch line, with no development or theoretical justification whatsoever, we'll use an analysis of variance table, such as this one:
Analysis of Variance for comparing all 5 brands |
|||||
---|---|---|---|---|---|
Source | DF | SS | MS | F | P |
Brand | 4 | 1174.8 | 293.7 | 7.95 | 0.000 |
Error | 45 | 1661.7 | 36.9 | ||
Total | 49 | 2836.5 |
to draw conclusions about the equality of two or more population means. And, as we always do when performing hypothesis tests, we'll compare the P-value to \(\alpha\), our desired willingness to commit a Type I error. In this case, the researcher's P-value is very small (0.000, to three decimal places), so he should reject his null hypothesis. That is, there is sufficient evidence, at even a 0.01 level, to conclude that the mean stopping distance for at least one brand of tire is different than the mean stopping distances of the others.
So far, we have seen a typical null and alternative hypothesis in the analysis of variance framework, as well as an analysis of variance table. Let's take a look at another example with the idea of continuing to work on developing the basic idea behind the analysis of variance method.
Example 13-2 Section
Suppose an education researcher is interested in determining whether a learning method affects students' exam scores. Specifically, suppose she considers these three methods:
- standard
- osmosis
- shock therapy
Suppose she convinces 15 students to take part in her study, so she randomly assigns 5 students to each method. Then, after waiting eight weeks, she tests the students to get exam scores.
What would the researcher's data have to look like to be able to conclude that at least one of the methods yields different exam scores than the others?
Answer
Suppose a dot plot of the researcher's data looked like this:
What would we want to conclude? Well, there's a lot of separation in the data between the three methods. In this case, there is little variation in the data within each method, but a lot of variation in the data across the three methods. For these data, we would probably be willing to conclude that there is a difference between the three methods.
Now, suppose instead that a dot plot of the researcher's data looked like this:
What would we want to conclude? Well, there's less separation in the data between the three methods. In this case, there is a lot of variation in the data within each method, and still some variation in the data across the three methods, but not as much as in the previous dot plot. For these data, it is not as obvious that we can conclude that there is a difference between the three methods.
Let's consider one more possible dot plot:
What would we want to conclude here? Well, there's even less separation in the data between the three methods. In this case, there is a real lot of variation in the data within each method, and not much variation at all in the data across the three methods. For these data, we would probably want to conclude that there is no difference between the three methods.
If you go back and look at the three possible data sets, you'll see that we drew our conclusions by comparing the variation in the data within a method to the variation in the data across methods. Let's try to formalize that idea a bit more by revisiting the two most extreme examples. First, the example in which we concluded that the methods differ:
Let's quantify (or are we still just qualifying?) the amount of variation within a method by comparing the five data points within a method to the method's mean, as represented in the plot as a color-coded triangle. And, let's quantify (or qualify?) the amount of variation across the methods by comparing the method means, again represented in the plot as a color-coded triangle, to the overall grand mean, that is, the average of all fifteen data points (ignoring the method). In this case, the variation between the group means and the grand mean is larger than the variation within the groups.
Now, let's revisit the example in which we wanted to conclude that there was no difference in the three methods:
In this case, the variation between the group means and the grand mean is smaller than the variation within the groups.
Hmmm... these two examples suggest that our method should compare the variation between the groups to that of the variation within the groups. That's just what an analysis of variance does!
Let's see what conclusion we draw from an analysis of variance of these data. Here's the analysis of variance table for the first study, in which we wanted to conclude that there was a difference in the three methods:
Source | DF | SS | MS | F | P |
---|---|---|---|---|---|
Factor | 2 | 2510.5 | 1255.3 | 93.44 | 0.000 |
Error | 12 | 161.2 | 13.4 | ||
Total | 14 | 2671.7 |
In this case, the P-value is small (0.000, to three decimal places). We can reject the null hypothesis of equal means at the 0.05 level. That is, there is sufficient evidence at the 0.05 level to conclude that the mean exam scores of the three study methods are significantly different.
Here's the analysis of variance table for the third study, in which we wanted to conclude that there was no difference in the three methods:
Source | DF | SS | MS | F | P |
---|---|---|---|---|---|
Factor | 2 | 80.1 | 40.1 | 0.46 | 0.643 |
Error | 12 | 1050.8 | 87.6 | ||
Total | 14 | 1130.9 |
In this case, the P-value, 0.643, is large. We fail to reject the null hypothesis of equal means at the 0.05 level. That is, there is insufficient evidence at the 0.05 level to conclude that the mean exam scores of the three study methods are significantly different.
Hmmm. It seems like we're on to something! Let's summarize.
The Basic Idea Behind Analysis of Variance Section
Analysis of variance involves dividing the overall variability in observed data values so that we can draw conclusions about the equality, or lack thereof, of the means of the populations from where the data came. The overall (or "total") variability is divided into two components:
- the variability "between" groups
- the variability "within" groups
We summarize the division of the variability in an "analysis of variance table", which is often shortened and called an "ANOVA table." Without knowing what we were really looking at, we looked at a few examples of ANOVA tables here on this page. Let's now go take an in-depth look at the content of ANOVA tables.