4.3 - Cell Means Model

Model 2 - The Cell Means Model

 

\(Y_{ij}=\mu_i+\epsilon_{ij}\)

where \(\mu_i\), \(i = 1,2, ...,T\) are the factor level means. Note that in this model, there is no overall mean being fitted.

The cell means model does not fit an overall mean, but instead fits an individual mean for each of the treatment levels. Let us examine this model for the same toy dataset assuming that a pair of observations arise from \(T = 3\) treatment levels. Each column will represent a specific treatment level using indicator coding, a ‘1’ for the rows corresponding to the observations receiving the specified treatment level, and a ‘0’ for the other rows.

To write the cell means model as a GLM, let \(\mathbf{X} = \begin{bmatrix} 1 & 0 & 0\\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} \mathbf{x}_1^{'} \\ \mathbf{x}_2^{'} \\ \mathbf{x}_3^{'} \\ \mathbf{x}_4^{'} \\ \mathbf{x}_5^{'} \\  \mathbf{x}_6^{'} \end{bmatrix}\), where \(\mathbf{x}_i^{'}\) is the \(i^{th}\) row vector of the design matrix. It can be seen that \(r=2\) is the number of replicates for each treatment level. Observe that column 1 generates the mean for treatment level 1, column 2 for treatment level 2, and column 3 for treatment level 3.

The parameter vector \(\boldsymbol{\beta}\) is a 3-dimensional column vector and is defined by,

\(\boldsymbol{\beta}= \begin{bmatrix}
\beta_0\\
\beta_1\\
\beta_2
\end{bmatrix} = \begin{bmatrix}
\mu_1\\
\mu_2\\
\mu_3
\end{bmatrix}.\)

The parameter estimates \(\hat{\boldsymbol{\beta}}\) can again be found using the least squares method. One can verify that \(\mu_{i} = \bar{y}_i\), the \(i^{th}\) treatment mean, for \(i=1,2,3\). Using this estimate, the resulting estimated regression equation for the cell means model is,

\(\hat{\mathbf{Y}} = \mathbf{X}\hat{\boldsymbol{\beta}}\)

which produces \(\hat{\mathbf{y}}_i = \mathbf{x}_i^{'}\begin{bmatrix} \hat{\mu}_1 \\ \hat{\mu}_2 \\ \hat{\mu}_3 \end{bmatrix}\).

Using Technology Section

Begin by replacing the design matrix in the IML code of Section 4.2 Step 2 with:

/* The Cell Means Model */
x={
1	0	0,
1	0	0,
0	1	0,
0	1	0,
0	0	1,
0	0	1};
 

We then re-run the program with the new design matrix to get the following output:

Regression Coefficients
Beta_0 1.5
Beta_1 3.5
Beta_2 5.5
ANOVA
Treatment dF SS MS F
2 16 8 16
Error 3 1.5 0.5  
Total 5 17.5    

The regression coefficients Beta_0, Beta_1, and Beta_2 are now the means for each treatment level, and in the ANOVA table, we see that the \(SS_{Error}\) (SSE) is 1.5. This reduction in the SSE is the \(SS_{trt}\). Notice that the error SS of the 'Overall Mean' model is the sum of the SS values for Treatment and Error term in this model, which means that by not including the treatment effect in that model, its error SS has been unduly inflated.

Adding the optional code given in Section 4.2 to compute additional internal computations, we can obtain:

xprimex
2 0 0
0 2 0
0 0 2
check
1 0 0
0 1 0
0 0 1
xprimey
3
7
11
SumY2
89.5
CF
73.5
xprimexinv
0.5 0 0
0 0.5 0
0 0 0.5

Here we can see that \(\mathbf{X}^{'}\mathbf{X}\) now contains diagonal elements that are the \(n_i\) = number of observations for each treatment level mean being computed. In addition, we can use the ‘working formula’ to verify the treatment SS is \(89.5 - 73.5 = 16\).

We can now test for the significance of the treatment by using the General Linear F-test:

\(F=\dfrac{(SSE_{reduced} - SSE_{full})/(dfE_{reduced} - dfE_{full})}{SSE_{full} / dfE_{full}}\)

The 'Overall Mean' model is the 'Reduced' model, and the 'Cell Means' model is the ‘Full’ model. From the ANOVA tables, we get:

\(F=\dfrac{17.5-1.5⁄5-3}{1.5⁄3}=16\)

which can be compared to \(F_{0.05,2,3}=9.55\).

First, we input the response variable and design matrix.

y = matrix(c(2,1,3,4,6,5), ncol=1)
x = matrix(c(1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1),6,3,byrow=TRUE)
x
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    1    0    0
[3,]    0    1    0
[4,]    0    1    0
[5,]    0    0    1
[6,]    0    0    1

Applying the same code as in Section 4.2, we can then calculate the regression coefficients and create the ANOVA table.

beta = solve(t(x)%*%x)%*%(t(x)%*%y) 
beta
     [,1]
[1,]  1.5
[2,]  3.5
[3,]  5.5

n = nrow(y)
p = ncol(x)
J = matrix(1,n,n)
ss_tot = (t(y)%*%y) - (1/n)*(t(y)%*%J)%*%y 
ss_trt = t(beta)%*%(t(x)%*%y) - (1/n)*(t(y)%*%J)%*%y 
ss_error = ss_tot - ss_trt 
total_df = n-1 
trt_df = p-1 
error_df = n-p 
MS_trt = ss_trt/(p-1)
MS_error = ss_error / error_df 
Fstat = MS_trt/MS_error

ANOVA <- data.frame(
  c ("","Treatment","Error", "Total"),
  c("DF", trt_df,error_df,total_df),
  c("SS", ss_trt, ss_error, ss_tot),
  c("MS", MS_trt, MS_error, ""),
  c("F",Fstat,"",""),
  stringsAsFactors = FALSE)
names(ANOVA) <- c(" ", "  ", " ","","")
print(ANOVA)
1           DF   SS  MS  F
2 Treatment  2   16   8 16
3     Error  3  1.5 0.5   
4     Total  5 17.5  

The intermediate matrix computations can again be calculated with the following commands (outputs not shown).

xprimex = t(x)%*%x 
xprimex
xprimey = t(x)%*%y 
xprimey
xprimexinv = solve(t(x)%*%x) 
xprimexinv
check = xprimexinv*xprimex 
check
SumY2 = t(beta)%*%(t(x)%*%y) 
SumY2
CF = (1/n)*(t(y)%*%J)%*%y 
CF

Creating a categorical variable with the levels, this model can be run with built in R functions as follows:

levels = c(1,1,2,2,3,3)
levels = as.factor(levels)
lm1 = lm(y~ -1 + levels)
beta = lm1$coefficients
beta
levels1 levels2 levels3 
    1.5     3.5     5.5  

To obtain the ANOVA table, use the following commands.

options(contrasts=c("contr.sum","contr.poly"))
library(car)
aov3 = Anova(lm(y~levels),type = 3) 
aov3
Anova Table (Type III tests)

Response: y
            Sum Sq Df F value   Pr(>F)   
(Intercept)   73.5  1     147 0.001208 **
levels        16.0  2      16 0.025095 * 
Residuals      1.5  3                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Again, the Type 3 ANOVA table is equivalent to the Type 1 table for this model.

aov1 = aov(lm(y~levels))
summary(aov1)
          Df Sum Sq Mean Sq F value Pr(>F)   
levels     2   16.5     8.0      16 0.0251 *
Residuals  3    1.5     0.5                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1