9.1 - ANCOVA in the GLM Setting: The Covariate as a Regression Variable

The statistical ANCOVA by definition is a general linear model that includes both ANOVA (categorical) predictors and regression (continuous) predictors. The simple linear regression model is:

\(Y_i=\beta_0+\beta_1 X_i+ \epsilon_i\)

Here, \(\beta_0\) and \(\beta_1\) are the intercept and the slope of the line, respectively. The significance of a regression is equivalent to testing  \(H_0 \colon \beta_1=0 \text{ vs. } H_1\colon \beta_1 \neq 0\) using the \(\text{F-statistic}  = \frac{MS(Regr)}{MSE}\) where \(MS(Regr)\) is the mean sum of squares for regression and MSE is the mean squared error. In this case of a simple linear regression, the F-test is equivalent to a t-test.

Now, in adding the regression variable to our one-way ANOVA model, we can envision a notational problem. In the balanced one-way ANOVA, we have the grand mean (\(\mu\)), but now we also have the intercept \(\beta_0\). To get around this, we can use

\(X^*=X_{ij}-\bar{X}_{..}\)

and get the following as an expression of our covariance model:

\(Y_{ij}=\mu+\tau_i+\gamma X^* +\epsilon_{ij}\)

Note, in a GLM, the Type III (model fit) sums of squares for the treatment levels are being corrected (or adjusted) for the regression relationship. This has the effect of evaluating the treatment levels ‘on the same playing field’. That is, comparing the means of the treatment levels at the mean value of the covariate. This process effectively removes the variation due to the covariate that may otherwise be attributed to treatment level differences.