If (and only if) we reject the Null Hypothesis, we then conclude at least one group is different from one other (importantly we do NOT conclude that all the groups are different). If it is the case that we reject the null, then we will want to know WHICH group or groups are different. In our example we are not satisfied knowing at least one treatment level is different, we want to know where the difference is and the nature of the difference. To answer this question, we can follow up the ANOVA with a mean comparison procedure to find out which means differ from each other and which ones don’t.
You might think we could not bother with the ANOVA and proceed with a series of ttests to compare the groups. While that is intuitively simple, it creates inflation of type I error. How does this inflation of type I error happen? For a single test,
\(\alpha=1(.95) \)
The probability of committing a type I error (by random chance) for two simultaneous tests follows from the Multiplication Rule for independent events in Probability, Recall that, for two independent events A and B the Probability of A and B both occurring is P(A and B) = P(A) * P(B). So for two tests, we have
\(\alpha = 1  ( (.95)*(.95) ) = 0.0975\)
which is now larger than the \(\alpha\) that we original set. For our example, we have 6 comparisons, so
\(\alpha = 1  (.95^6) = 0.2649\) which is a much larger (inflated) probability of committing a type I error than we originally set.
The multiple comparison procedures compensate for the type I error inflation (although each does so in a slightly different way).
There are several comparison procedures which can be employed, but we will start with the one most commonly used, the Tukey procedure. In the Tukey procedure, we compute a ‘yardstick’ value based on the \(MS_{\text{Error}}\) and the number of means being compared. If any two means differ by more than the Tukey w value, then they are significantly different.
 Step 1: Compute Tukey’s w value
\(w=q_{\alpha(p, df_{Error})}\cdot s_{\bar{Y}}\)
where \(q_α\) is obtained from a Table of Tukey q values,


and p = the number of treatment levels
\(s_\bar{Y}\) =standard error of a treatment mean = \(\sqrt{MS_{Error}/r}\)
r = number of replicationsFor our greenhouse example we get:
\(w=q_{.05(4,20)}\sqrt{(3.052⁄6)}=3.96(0.7132)=2.824\)
 Step 2: Rank the means, calculate differences
For the greenhouse example, we rank the means as:
29.20 28.6 25.87 21.00 Start with the largest and secondlargest means and calculate the difference:
29.20 – 28.60 = 0.60 which is less than our w of 2.824, so we indicate there is no significant difference between these two means by placing the letter “a” under each:
29.20 28.6 25.87 21.00 a a Then calculate the difference between the largest and thirdlargest means:
29.20 – 25.87 = 3.33 which exceeds the critical w of 2.824, so we can label with a “b” to show this difference is significant:
29.20 28.6 25.87 21.00 a a b Now we have to consider whether or not the second largest and thirdlargest differ significantly. This is a step that sets up a ‘back and forth’ process. Here
28.6 – 25.87 = 2.73, less than the critical w of 2.824, so these two means do not differ significantly. We need to add a “b” to show this:
29.20 28.6 25.87 21.00 a ab b Continuing down the line, we now calculate the next difference:
28.60 – 21.00 = 7.60, exceeding the critical w, so we now add a “c”:
29.20 28.6 25.87 21.00 a ab b c Again, we need to go back and check to see if the thirdlargest also differs from the smallest:
25.87 – 21.00 = 4.87, which it does. So we are done.
These letters can be added to figures showing the means in bar charts to summarize the results of the ANOVA.