Lesson 3a: 'Behind the Curtains'  How is ANOVA Calculated?
Lesson 3a: 'Behind the Curtains'  How is ANOVA Calculated?In the lessons so far we developed the ANOVA conceptually in terms of deviations from means. In the case of the treatment SS, we calculated the deviations of treatment level means from the overall mean, and for the residual error SS we calculated the deviations of individual observations from treatment level means. In practice, however, computing the SS for ANOVA is done differently. To understand this we can utilize the fundamental equivalence of the following two expressions:
\(SS=\Sigma(Y_i\bar{Y})^2=\Sigma Y_{i}^{2}\frac{(\Sigma Y_i)^2}{n}\)
The latter expression is often referred to as the ‘working formula’ or ‘machine formula’ and is used by computer software to calculate the various SS in the ANOVA.
The first part of the working formula is simply squaring and summing the observations. For computing the SS to for the total variance, the formula above would be used, but in the case of computing the SS for the treatment, we have the following modification:
\(SS_{treatment} = \sum_{i=1}^{k}\frac{\left( \sum_{j=1}^{n_i}Y_{ij} \right)^2}{n_i}\)
The second part of the working formula, \(\dfrac{(\Sigma Y_i)^2}{n}\) is referred to as a ‘correction factor’, abbreviated as CF. Employing this formulation is readily accomplished by using matrix algebra as we will see below. To gain a view into the process that software is using, we will use a procedure in SAS, Proc IML (the Interactive Matrix Language). The interactive matrix language in SAS provides an ideal setting in which to see how the various ANOVA models discussed in the online lesson notes and in our textbook are actually computed. The following 4 models will be set up and run here in this lesson:
 Model 1  The Overall Mean Model
See Textbook: Equation 16.58
 \(Y_{ij}=\mu_.+\epsilon_{ij}\)

which simply fits an overall or ‘grand’ mean. This model reflects the situation where \(H_0\) is true and \(\mu_1=\mu_2=\cdots=\mu_r\).
 Model 2  The Cell Means Model
See Textbook: Equation 16.57
 \(Y_{ij}=\mu_i+\epsilon_{ij}\)

where \(\mu_i\) is the factor level means. Note that in this model and there is no overall mean being fitted.
 Model 3  Dummy Variable Regression (No Textbook Equation)
 \(Y_{ij}=\mu_.+\mu_i+\epsilon_{ij}\), fitted as \(Y_ij=\beta_0+\beta_{Level1}+\beta_{Level2} \cdots \beta_{Levelr1}+\epsilon_{ij}\)

where \(\beta_i\) are regression coefficients for r1 indicatorcoded regression ‘dummy’ variables that are coded to indicate the r categorical factor levels. The rth factor level mean is given by the regression intercept \(\beta_0\).
 Model 4  The Effects Model
See Textbook: Equation 16.62
 \(Y_{ij}=\mu_.+\tau_i+\epsilon_{ij}\)

where \(\tau_i\) are the the deviations of each factor level mean from the overall mean.
Running the different models above is accomplished by simply changing the design matrix \(\mathbf{X}\) in the general linear model (GLM): \(\mathbf{Y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}\).
3a.1  The Overall Mean Model
3a.1  The Overall Mean ModelThe design matrix for the overall mean model is:
\(\mathbf{Y}=\begin{bmatrix}
2\\
1\\
3\\
4\\
6\\
5
\end{bmatrix}
\mathbf{X}=\begin{bmatrix}
1\\
1\\
1\\
1\\
1\\
1
\end{bmatrix}
\boldsymbol{\beta}=(\mu) \;\;
\boldsymbol{\epsilon}=\begin{bmatrix}
\epsilon_1\\
\epsilon_2\\
\epsilon_3\\
\epsilon_4\\
\epsilon_5\\
\epsilon_6
\end{bmatrix}\)
The estimates of the parameters in \(\boldsymbol{\beta}\) are obtained as:
\(b= (\mathbf{X}^{'} \mathbf{X} ) ^{1} \mathbf{X}^{' } \mathbf{Y}\)
In this simplest of cases, we can see how this works. The term \(\mathbf{X}^{'} \mathbf{X}\) would be:
\(\begin{bmatrix}
1 & 1 & 1 & 1 &1 & 1
\end{bmatrix}\ast \begin{bmatrix}
1\\
1\\
1\\
1\\
1\\
1
\end{bmatrix}=1+1+1+1+1+1=6 = n\)
exponent, indicating that we will be dividing by \(\mathbf{X}^{'} \mathbf{X}\).
The term \(\mathbf{X}^{'} \mathbf{Y}\) would be:
\(\begin{bmatrix}
2 & 1 & 3 & 4 &5 & 6
\end{bmatrix}\ast \begin{bmatrix}
1\\
1\\
1\\
1\\
1\\
1
\end{bmatrix}=2+1+3+4+5+6=21 = \Sigma Y_i\)
So in this case, the estimate \(\mathbf{b}\) is simply the overall mean = \(\mu_{.}= 21/6 = 3.5\)
To perform these matrix operations in SAS IML, we will open a regular SAS editor window, and then copy and paste three components from the file (IML Design Matrices):
SAS: Overall Mean Model
 Step 1
Procedure initiation, and specification of the dependent variable vector, \(\mathbf{Y}\).
For our example we have:
/* Initiate IML, define response variable */ proc iml; y={ 2, 1, 3, 4, 6, 5};
 Step 2
We then enter a design matrix (\(\mathbf{X}\)). For the Overall Mean model and our example data, we have:
x={ 1, 1, 1, 1, 1, 1};
 Step 3
We can now copy and paste a program for the matrix computations to generate results (regression coefficients and ANOVA output):
beta=inv(x`*x)*x`*y; beta_label={"Beta_0","Beta_1","Beta_2","Beta_3"}; print beta [label="Regression Coefficients" rowname=beta_label]; n=nrow(y); p=ncol(x); j=j(n,n,1); ss_tot = (y`*y)  (1/n)*(y`*j)*y; ss_trt = (beta`*(x`*y))  (1/n)*(y`*j)*y; ss_error = ss_tot  ss_trt; total_df=n1; trt_df=p1; error_df=np; MS_trt = ss_trt/(p1); MS_error = ss_error / error_df; F=ms_trt/ms_error; empty={.}; row_label= {"Treatment", "Error", "Total"}; col_label={"df" "SS" "MS" "F"}; trt_row= trt_df  ss_trt  ms_trt  F; error_row= error_df  ss_error  ms_error  empty; tot_row=total_df  ss_tot  empty  empty; aov = trt_row // error_row // tot_row; print aov [label="ANOVA" rowname=row_label colname=col_label];
Here is a quick video walkthrough to show you the process for how you can do this in SAS. (Rightclick and select 'Show All' if your browser does not display the entire screencast window.)
The program can then be run to produce the following output:
Regression Coefficients  

Beta_0  3.5 
ANOVA  

df  SS  MS  F  
Treatment  0  0  
Error  5  17.5  3.5  
Total  5  17.5 
We see the estimate of the regression coefficient for \(\beta_0\) is indeed the overall mean of the response variable. In this simple case, where the treatment factor has not entered the model, the only item of interest from the ANOVA table would be the \(SS_{Error}\) for later use in the General Linear F test.
To check to see the internal calculations, we can add the following few lines to the end of the calculation code.
/* (Optional) Intermediates in the matrix computations */
xprimex=x`*x; print xprimex;
xprimey=x`*y; print xprimey;
xprimexinv=inv(x`*x); print xprimexinv;
check=xprimexinv*xprimex; print check;
SumY2=beta`*(x`*y); print SumY2;
CF=(1/n)*(y`*j)*y; print CF;
This additional code produces the following output:
xprimex 

6 
xprimey 

21 
xprimexinv 

0.1666667 
check 

1 
SumY2 

73.5 
SumY_2 

73.5 
From this we can verify the computations for the \(SS_{treatment}\Sigma Y_{i}^{2} – \frac{(\Sigma Y_i)^2}{n}=\Sigma Y^{2}CF = 0\).
The ‘check’ calculation is confirming that \(\mathbf{X}^{1} \mathbf{X} = 1\), which in fact defines the matrix division operation. In this simple case, it amounts to simple division by n, but in other models that we will work with, the matrix division process is more complicated. The inverse (\(\mathbf{X}^{1}\)) is defined by the product \(\mathbf{X}^{1} \mathbf{X} = \mathbf{I}\), where \(\mathbf{I}\) is the identity matrix (a diagonal matrix of ‘1’s). Here in this simple case there is only a single diagonal element.
3a.2  Cell Means Model
3a.2  Cell Means ModelTo run the cell means model, wherein we don’t fit an overall mean, but instead fit an individual mean for each of the treatment levels, we simply replace the design matrix in the IML code with:
/* The Cell Means Model */
x={
1 0 0,
1 0 0,
0 1 0,
0 1 0,
0 0 1,
0 0 1};
Each column is using indicator coding, a ‘1’ for the treatment, and 0 for other. So column 1 will generate the mean for treatment level1, column 2 for treatment level2, and column 3 for treatment level3.
We then rerun the program with the new design matrix to get the following output:
Regression Coefficients  

Beta_0  1.5 
Beta_1  3.5 
Beta_2  5.5 
ANOVA  

df  SS  MS  F  
Treatment  2  16  8  16 
Error  3  1.5  0.5  
Total  5  17.5 
We can see that the regression coefficients are now the means for each treatment level, and in the ANOVA table we see that the \(SS_{Error}\) is 1.5. This reduction in the \(SS_{Error}\) is the \(SS_{treatmemt}\).
Internally, we have:
xprimex  

2  0  0 
0  2  0 
0  0  2 
check  

1  0  0 
0  1  0 
0  0  1 
xprimey 

3 
7 
11 
SumY2 

89.5 
CF 

73.5 
xprimexinv  

0.5  0  0 
0  0.5  0 
0  0  0.5 
Here we can see that \(\mathbf{X}^{'}\mathbf{X}\) now contains diagonal elements that are the \(n_i\) = number of observations for each treatment level mean being computed. In addition, we can verify that the ‘working formula’ \(CF = \Sigma Y^2  CF = 16\), the treatment SS.
We can now test for the significance of the treatment by using the General Linear F test:
See Textbook: Section 16.7
\(F=\dfrac{SSE_{reduced}  SSE_{full}/dfE_{reduced}  dfE_{full}}{SSE_{full} / dfE_{full}}\)
The Overall Mean model is the ‘Reduced’ model, and the Cell Means model is the ‘Full’ model. From the ANOVA tables, we get:
\(F=\dfrac{17.51.5⁄53}{1.5⁄3}=16\)
which can be compared to \(F_{.05,2,3}=9.55\).
3a.3  Dummy Variable Regression
3a.3  Dummy Variable RegressionThe GLM can be viewed from the regression perspective as an ordinary multiple linear regression (MLR) with ‘dummy’ coding (actually indicator coding) for the categorical treatment levels. Typically software performing the MLR will automatically include an intercept, which complicates the interpretation of the regression coefficients.
In IML, we now replace the design matrix with:
/* Dummy Variable Regression Model */
x = {
1 1 0,
1 1 0,
1 0 1,
1 0 1,
1 0 0,
1 0 0};
We can see something strange with this design matrix. Although we have three treatment levels in this example, we have a column of ‘1’s’ for the intercept and only two columns that have been indicator coded for treatment levels.. The reason for this is that the complete matrix
\(\begin{bmatrix}
1 & 1 & 0 & 0\\
1 & 1 & 0 & 0\\
1 & 0 & 1 & 0\\
1 & 0 & 1 & 0\\
1 & 0 & 0 & 1\\
1 & 0 & 0 & 1
\end{bmatrix}\)
has the property that the sum of columns 24 will equal the first column for the intercept. As a result, a condition called singularity is created and the matrix computations will not run. So one of the treatment levels is omitted from the coding in our design matrix above for IML. It doesn’t matter which one we eliminate, although some refer to the eliminated level as a ‘reference’ level. Unlike logistic regression, the treatment level that we eliminate from the coding here does not act as a reference level for comparing other treatment levels. Its simply an algebraic manipulation. In the our design matrix for IML we have eliminated the indicator coding for treatment level3.
Rerunning IML, we now get the following output;
Regression Coefficients  

Beta_0  5.5 
Beta_1  4 
Beta_2  2 
The coefficient for \(\beta_0\) is the mean for treatment level3. The mean for treatment level1 is then calculated from \(\beta_0+\beta_1=1.5\). Likewise, the mean for treatment level2 is calculated as \(\beta_0+\beta_2=3.5\).
Notice that the F statistic calculated from this model is the same as that produced from the Cell Means model.
ANOVA  

df  SS  MS  F  
Treatment  2  16  8  16 
Error  3  1.5  0.5  
Total  5  17.5 
Using Minitab
We can confirm our ANOVA table now by running the analysis is ordinary software such as Minitab, given that we can set the coding that the software uses. In Mintab, under the Stat > ANOVA > General Linear Model, we control this by specifying Indicator (1,0) coding:
This produces the regular ANOVA output:
Analysis of Variance
Source  DF  Adj SS  Adj MS  FValue  PValue 

trt  2  16.000  8.0000  16.00  0.025 
Error  3  1.500  0.5000  
Total  5  17.500 
And also the Regression Equation
Regression Equation
y = 5.500  4.000 trt_level1  2.000 trt_level2 + 0.0 trt_level3
Using SAS
In SAS, the default coding is indicator coding, so when you specify the option
model y=trt / solution;
you get the regression coefficients:
Solution for Fixed Effects  

Effect  trt  Estimate  Standard Error  DF  t Value  Pr > t 
Intercept  5.5000  0.5000  3  11.00  0.0016  
trt  level1  4.0000  0.7071  3  5.66  0.0109 
trt  level2  2.0000  0.7071  3  2.83  0.0663 
trt  level3  0 
And the same ANOVA table:
Type 3 Analysis of Variance  

Source  DF  Sum of Squares  Mean Square  Expected Mean Square  Error Term  Error DF  F Value  Pr > F 
trt  2  16.000000  8.000000  Var(Residual)+Q(trt)  MS(Residual)  3  16.00  0.0251 
Residual  3  1.500000  0.500000  Var(Residual) 
The Intermediate calculations for this model are:
xprimex  

6  2  2 
2  2  0 
2  0  2 
check  

1  2.22E16  0 
3.331E16  1  0 
0  0  1 
xprimey 

21 
3 
7 
SumY2 

89.5 
CF 

73.5 
xprimexinv  

0.5  0.5  0.5 
0.5  1  0.5 
0.5  0.5  1 
3a.4  The Effects Model
3a.4  The Effects ModelIn effects model we are no longer estimating means, but instead are estimating the deviations of treatment means from the overall mean (the \(tau_i\)). The model must include the intercept, so we have the following design matrix for IML:
/* The Effects Model */
x={
1 1 0,
1 1 0,
1 0 1,
1 0 1,
1 1 1,
1 1 1};
Here we have another omission of a treatment level, but for a different reason. In the effects model, we have the constraint \(\Sigma \tau_i =0\). As a result, we need to remove one treatment level. Again, the coding for treatment level3 has been omitted.
Rerunning with this design matrix yields:
Regression Coefficients  

Beta_0  3.5 
Beta_1  2 
Beta_2  0 
ANOVA  

df  SS  MS  F  
Treatment  2  16  8  16 
Error  3  1.5  0.5  
Total  5  17.5 
The regression coefficient \(\beta_0\) is the overall mean and the coefficients \(\beta_1\) and \(\beta_2\) are \(\tau_1\) and \(\tau_2\) respectively. The estimate for \(\tau_3\) is obtained as \(– (\tau_1)( \tau_2) = 2.0\).
If we change the coding in Minitab now to be Effect coding (1,0,1), the default setting, we get the following:
Regression Equation
y = 3.500  2.000 trt_level1  0.000 trt_level2 + 2.000 trt_level3
The ANOVA table is the same as for the dummyvariable regression model above.
The intermediates were:
xprimex  

6  0  0 
0  4  2 
0  2  4 
check  

1  0  0 
0  1  0 
0  0  1 
xprimey 

21 
8 
4 
SumY2 

89.5 
CF 

73.5 
xprimexinv  

0.1666667  0  0 
0  0.3333333  0.166667 
0  0.166667  0.3333333 
3a.5  Summary
3a.5  SummaryBy coding treatment or factor levels in to numerical terms, we can use regression methods to perform the ANOVA.
See Textbook: Section 16.8
To state the ANOVA model
\(Y_{ij}=\mu_. + \tau_i + \epsilon_{ij}\)
as a regression model, we need to represent the parameters \(\mu_., \tau_i , ... , \tau_r\) in the model. However, constraint 16.64 (from text) for the case of equal weightings:
\(\sum_{i=1}^{r} \tau_i =0\)
implies that one of the r parameters \(\tau_i\) is not needed since it can be expressed in terms of the other r  1 parameters. We shall drop the parameter \(\tau_r\), which according to constraint 16.64 (from text) can be expressed in terms of the other r  1 parameters \(\tau_i\) as follows:
\(\tau_r\ = \tau_1  \tau_2  \cdots \tau_{r1}\)
Thus, we shall use only the parameters \(\mu_., \tau_1 , ... , \tau_{r1}\) for the linear model.
To illustrate how a linear model is developed with this approach, consider a singlefactor study with r = 3 factor levels when \(n_1=n_2=n_3=2\). The Y, X, \(\boldsymbol{\beta}\) and \(\boldsymbol{\epsilon}\) matrices for this case are as follows:
\(\mathbf{Y} = \begin{bmatrix}Y_{11}\\ Y_{12}\\ Y_{21}\\ Y_{22}\\ Y_{31}\\ Y_{32}\end{bmatrix} \mathbf{X} = \begin{bmatrix}1 & 1 & 0\\ 1 & 1 & 0\\ 1 & 0 & 1\\ 1 & 0 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\end{bmatrix} \boldsymbol{\beta} = \begin{bmatrix}\beta_{0}\\ \beta_{1}\\ \beta_{2}\end{bmatrix} \boldsymbol{\epsilon} = \begin{bmatrix}\epsilon_{11}\\ \epsilon_{12}\\ \epsilon_{21}\\ \epsilon_{22}\\ \epsilon_{31}\\ \epsilon_{32}\end{bmatrix}\)
Note that the vector of expected values \(\mathbf{E}\{\mathbf{Y}\} =\mathbf{X}\boldsymbol{\beta}\), yields the following:
\(\mathbf{E}\{\mathbf{Y}\} = \begin{bmatrix}E\{Y_{11}\}\\ E\{Y_{12}\}\\ E\{Y_{21}\}\\ E\{Y_{22}\}\\E\{Y_{31}\}\\E\{Y_{32}\}\end{bmatrix} =\mathbf{X}\boldsymbol{\beta}= \begin{bmatrix}1 & 1 & 0\\ 1 & 1 & 0\\ 1 & 0 & 1\\ 1 & 0 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1\end{bmatrix}\begin{bmatrix}\beta_{0}\\ \beta_{1}\\ \beta_{2}\end{bmatrix}=\begin{bmatrix}\mu_{.}+\tau_{1}\\ \mu_{.}+\tau_{1}\\ \mu_{.}+\tau_{2}\\ \mu_{.}+\tau_{2}\\ \mu_{.}\tau_{1}\tau_{2}\\ \mu_{.}\tau_{1}\tau_{2}\end{bmatrix}\)
Since \(\tau_3=\tau_{1}\tau_{2}\), see above, we see that \(E\{Y_{31}\} =E\{Y_{32}\}=\mu_{.}+\tau_{3}\). Thus, the above X matrix and \(\beta\) vector representation provides in all cases the appropriate expected values:
\(E\{Y_{ij}\} =\mu_{.}+\tau_{i}\)