3.2 - Assumptions and Diagnostics

We are getting a bit ahead of our selves in discussing the significance of the group effect. Before we draw any conclusions about the significance of the model, we need to make sure we have a "valid" model . Like any other statistical test, the ANOVA has assumptions that must be met. Failure to meet these assumptions means any conclusion draw from the model are not to be trusted. For those of you who read results of statistical analysis think about how many article you read that DO NOT address these assumptions! How much of our applied science is based on models that are not valid (this is obviously food for thought but we want you to pay attention next time you read some literature).

So what are these assumptions are being made to employ the ANOVA model? Well, if you consider that the error term is what is left over after our group effects, then any issue with the data OTHER than the group effect will show up in the residuals. So to test the assumptions, we look at the residuals (\(\epsilon_{ij}\)). The errors are assumed to be:

  1. normally distributed
  2. have a mean of 0
  3. are independent
  4. have equal variance among treatment levels (homogeneity)

To evaluate the model fit and these assumptions, we employ some diagnostic tests. The first and most important of these is to plot the residuals. Residuals are often plotted against the predicted values of Y, or plotted against the treatment levels (which I strongly recommend also doing). The normality assumption can be addressed by producing a normal probability plot. Section 18.1 -18.3 in the text discusses various diagnostic procedures in detail.

There are various statistical tests for some of these assumptions, but these methods (e.g. Bartlett’s test for homogeneity) are too sensitive and indicate that problems exist when they really don’t. It turns out that the ANOVA is very robust and is not badly affected by small violations of these assumptions. In reality, we use a certain amount of common sense and visual inspection of the residual plots to determine if we have problems.

Generating residuals and working with them for diagnostics is done using statistical software. In this lesson, we will be looking at residuals and associated diagnostics generated in both SAS and Minitab. Residual plots can help identify potential outliers, and the pattern of residuals vs. fitted values or treatments may suggest a transformation of the response variable.

e X 0 (a) e X 0 (b) e X 0 (c) e Time 0 (d)

A common problem encountered in ANOVA is when the variance of treatment levels is not equal (heterogeneity of variance). If the variance is increasing in proportion to the mean (panel (c) in the figure above), a logarithmic transformation of Y can 'stabilize' the variances. If the residuals vs. fitted values instead shows a curvi-linear trend (panel (b) in the figure above), then a quadratic or other transformation may help. Since finding the correct transformation can be challenging, on helpful method, the Box-Cox method is often used to identify transformations. In this system, we find a \(\lambda\) for \(Y^\lambda\).

\(\lambda\) \(Y^\lambda\) Transformation
2 \(Y^2\) Square
1 \(Y^1\) Original (No transform)
1/2 \(\sqrt{Y}\) Square Root
0 \(log(Y)\) Logarithm
-1/2 \(\frac{1}{\sqrt{Y}}\) Reciprocal Square Root
-1 \(\frac{1}{Y}\) Reciprocal
Minitab 18

Minitab®  –  Box-Cox Procedures

To run the Box-Cox procedure in Minitab,

Set up the data (Simulated Data), as a stacked format (a column with treatment (or trt combination) levels, and the second column with the response variable.

Treatment Response Variable
A 12
A 23
A 34
B 45
B 56
B 67
C 14
C 25
C 36
  1. On the Minitab toolbar, choose

    Stat > Control Charts > Box Cox Transformation

  2. You will get the following window:

    Box-Cox Transformation dialog box in Minitab
  3. Click OK to finish.

You will get an output like this:

box cox plot

In the upper right-hand box, you see a rounded value for \(\lambda\). From this value you can determine what transformation of the response variable to use. Here, with a \(\lambda\) of 1, no transformation is recommended.

The Box-Cox procedure in SAS is more complicated. It is done through the Transreg procedure, but first requires coding the treatment levels (with effect coding) and getting the ANOVA solution with regression. We will use Minitab instead for this task.