6.6 - Mixed Model: School Example

Consider the experimental setting in which the investigators are interested in comparing the classroom self-ratings of teachers. They created a tool that can be used to self-rate the classrooms. The investigators are interested in comparing the Eastern vs. Western US regions, and the type of school (Public vs. Private). Investigators chose 2 teachers randomly from each combination and each teacher submits scores from 2 classes that they teach.

You can download the data at Schools Data.

If we carefully consider the information in the setup, we see that the US region makes sense as a fixed effect, and so does the type of school. However, the investigators are probably not interested in testing for significant differences among individual teachers they recruited for the study, but more realistically they would be interested in how much variation there is among teachers (a random effect).

For this example, we can use a mixed model in which we model teacher as a random effect nested within the factorial fixed treatment combinations of Region and School type.

Using Technology Section

In SAS we would set up the ANOVA as:

proc mixed data=school covtest method=type3;
class Region SchoolType Teacher Class;
model sr_score = Region SchoolType Region*SchoolType;
random Teacher(Region*SchoolType);
store out_school;
run;

In SAS proc mixed, we see that the fixed effects appear in the model statement, and the nested random effect appears in the random statement.

We get the following partial output:

 
Type 3 Analysis of Variance
Source DF Sum of Squares Mean Square Expected Mean Square Error Term Error DF F Value Pr > F
Region 1 564.062500 564.062500 Var(Residual) + 2 Var(Teach(Region*School)) + Q(Region,Region*SchoolType) MS(Teach(Region*School)) 4 24.07 0.0080
SchoolType 1 76.562500 76.562500 Var(Residual) + 2 Var(Teach(Region*School)) + Q(SchoolType,Region*SchoolType) MS(Teach(Region*School)) 4 3.27 0.1450
Region*SchoolType 1 264.062500 264.062500 Var(Residual) + 2 Var(Teach(Region*School)) + Q(Region*SchoolType) MS(Teach(Region*School)) 4 11.27 0.0284
Teach(Region*School) 4 93.750000 23.437500 Var(Residual) + 2 Var(Teach(Region*School)) MS(Residual) 8 5.00 0.0257
Residual 8 37.500000 4.687500 Var(Residual) . . . .

The results for hypothesis tests for the fixed effects appear as:

 
Type 3 Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
Region 1 4 24.07 0.0080
SchoolType 1 4 3.27 0.1450
Region*SchoolType 1 4 11.27 0.0284

 

Given that the Region*SchoolType interaction is significant, the PLM procedure along with the lsmeans statement can be used to generate the Tukey mean comparisons and produce the groupings chart and the plots to identify what means differ significantly.

ods graphics on;
proc plm restore=out_school;
lsmeans Region*SchoolType / adjust=tukey plot=meanplot cl  lines;
run; 

 
Region*SchoolType Least Squares Means
Region SchoolType Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
EastUS Private 85.7500 2.4206 4 35.42 <.0001 0.05 79.0293 92.4707
EastUS Public 73.2500 2.4206 4 30.26 <.0001 0.05 66.5293 79.9707
WestUS Private 89.5000 2.4206 4 36.97 <.0001 0.05 82.7793 96.2207
WestUS Public 93.2500 2.4206 4 38.52 <.0001 0.05 86.5293 99.9707

Plot of Score least-squares means for Region*SchoolType. With 95% confidence limits.

 
Differences of Region*SchoolType Least Squares Means
Adjustment for Multiple Comparisons: Tukey
Region SchoolType _Region _SchoolType Estimate Standard Error DF t Value Pr > |t| Adj P Alpha Lower Upper Adj Lower Adj Upper
EastUS Private EastUS Public 12.5000 3.4233 4 3.65 0.0217 0.0703 0.05 2.9955 22.0045 -1.4356 26.4356
EastUS Private WestUS Private -3.7500 3.4233 4 -1.10 0.3349 0.7109 0.05 -13.2545 5.7545 -17.6856 10.1856
EastUS Private WestUS Public -7.5000 3.4233 4 -2.19 0.0936 0.2677 0.05 -17.0045 2.0045 -21.4356 6.4356
EastUS Public WestUS Private -16.2500 3.4233 4 -4.75 0.0090 0.0301 0.05 -25.7545 -6.7455 -30.1856 -2.3144
EastUS Public WestUS Public -20.0000 3.4233 4 -5.84 0.0043 0.0146 0.05 -29.5045 -10.4955 -33.9356 -6.0644
WestUS Private WestUS Public -3.7500 3.4233 4 -1.10 0.3349 0.7109 0.05 -13.2545 5.7545 -17.6856 10.1856
Plot of all pairwise Score least-squares means differences for Region*SchoolType with Tukey adjustment at significance level 0.05. Score Comparisons for Region*SchoolType Significant Not significant Differences for alpha=0.05 (Tukey Adjustment) EastUS Private EastUS Public WestUS Private WestUS Public EastUS Private EastUS Public WestUS Private WestUS Public 70 80 90 100 70 80 90 100
WestUS WestUS EastUS EastUS Public Private Private Public 93.2500 89.5000 85.7500 73.2500 Region SchoolType Estimate Score Tukey Grouping for LS-Means of Region*SchoolType (Alpha = 0.05) LS-means covered by the same bar are not significantly different.

 

From the results, it is clear that the mean self-rating scores are highest for the public school in the west region. The difference mean scores for schools in the west region are significantly different from the mean scores for public schools in the east region.

The covtest option produces the results needed to test the significance of the random effect, Teach(Region*SchoolType) in terms of the following null and alternative hypothesis:

\(H_0: \sigma_{teacher}^2=0 \ \text{vs.}  H_a: \sigma_{teacher}^2\ > 0\)

However, as the following display shows, covtest option uses the Wald Z-test, which is based on the z-score of the sample statistic and hence is appropriate only for large samples - specifically when the number of random effect levels is sufficiently large. Otherwise, this test may not be reliable.

 
Covariance Parameter Estimates
Cov Parm Estimate Standard
Error
Z Value Pr Z
Teach(Region*School) 9.3750 8.3689 1.12 0.2626
Residual 4.6875 2.3438 2.00 0.0228

Therefore, in this case, as the number of teachers employed is few, Wald's test may not be valid. It is more appropriate to use the ANOVA F-test for Teacher(Region*SchoolType). Note that the results from the ANOVA table suggest that the effects of the teacher within the region and school type are significant (Pr > F = 0.0257), whereas the results based on Wald's test suggest otherwise (since the p-value is 0.2626).

In Minitab, specifying the mixed model is a little different.

In Stat > ANOVA > General Linear Model > Fit General Linear Model

we complete the dialog box:

Minitab General Linear Model selection window 1

 

We can create interaction terms under Model… by selecting “region” and “school_type” and clicking Add.

Minitab General Linear Model window for selecting interaction terms

 

Finally, we create nested terms and effects are random under Random/Nest…:

Minitab General Linear Model Random/Nest window for selecting nested terms

 

Minitab Output for the mixed model:

Factor Information

Factor Type Levels Values
region Fixed 2 EastUS, WestUS
school_type Fixed 2 Private, Public
teacher(region school_type) Random 8 1(EastUS,Private), 2(EastUS,Private,)
1(EastUS,Public), 2(EastUS, Public),
1(WestUS, Private), 2(WestUS, Private),
1(WestUS,Public), 2(WestUS,Public)

Analysis of Variance

Source DF Seq SS Adj SS Adj MS F-Value P-Value
region 1 564.06 564.06 564.06 24.07 0.008
school_type 1 76.56 76.56 76.56 3.27 0.145
region*school_type 1 264.06 264.06 264.06 11.27 0.028
teacher(region schoo_type) 4 93.75 93.75 23.44 5.00 0.026
Error 8 37.50 37.50 4.69    
Total 15 1035.94        

Model Summary

S R-sq R-sq(adj) R-sq(pred)
2.16506 96.38% 93.21% 85.52%

Minitab’s results are in agreement with SAS Proc Mixed.

First, load and attach the schools data in R.

setwd("~/path-to-folder/")
schools_data<-read.table("schools_data.txt",header=T,sep ="\t")
attach(schools_data)

We then fit the mixed effects model and display the ANOVA table for the fixed effects and the LRT for the random effect.

library(lme4,lmerTest)
lm_rand<-lmer(SR_score ~ region + school_type + region:school_type + (1 | teacher:(region:school_type)))
anova(lm_rand)
Type III Analysis of Variance Table with Satterthwaite's method
                    Sum Sq Mean Sq NumDF DenDF F value   Pr(>F)   
region             112.812 112.812     1     4 24.0667 0.008011 **
school_type         15.312  15.312     1     4  3.2667 0.144986   
region:school_type  52.812  52.812     1     4 11.2667 0.028395 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ranova(lm_rand)
ANOVA-like table for random-effects: Single term deletions

Model:
SR_score ~ region + school_type + (1 | teacher:(region:school_type)) + region:school_type
                                   npar  logLik    AIC    LRT Df Pr(>Chisq)  
                                      6 -32.288 76.576                       
(1 | teacher:(region:school_type))    5 -34.153 78.306 3.7298  1    0.05345 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can also examine the variance estimates and their 95% CIs.

vc = VarCorr(lm_rand)
print(vc,comp=c("Variance"))
 Groups                       Name        Variance
 teacher:(region:school_type) (Intercept) 9.3750  
 Residual                                 4.6875  
ci_sig = confint(lm_rand, parm="theta_", oldNames = FALSE)
ci_sig2 = ci_sig^2
row.names(ci_sig2) = sapply(row.names(ci_sig2),function(x){gsub("sd_","var_",x)},USE.NAMES=FALSE)
row.names(ci_sig2)[length(row.names(ci_sig2))] = "sigma2"
ci_sig2
                                                2.5 %   97.5 %
var_(Intercept)|teacher:(region:school_type) 0.000000 16.62054
sigma2                                       2.017642 14.61089

Notice, the results of the LRT as well as the CIs suggest the random effect is not significant (different from the results of the F-test in SAS). However, the region and school type interaction is significant so we produce a Tukey pairwise analysis.

ls_means(lm_rand, "region:school_type", pairwise = TRUE)
lsmeans = emmeans::emmeans(lm_rand, pairwise~region:school_type, adjust="tukey") 
ci = confint(lsmeans)
ci$contrasts
 contrast                        estimate   SE df lower.CL upper.CL
 EastUS Private - WestUS Private    -3.75 3.42  4   -17.69    10.19
 EastUS Private - EastUS Public     12.50 3.42  4    -1.44    26.44
 EastUS Private - WestUS Public     -7.50 3.42  4   -21.44     6.44
 WestUS Private - EastUS Public     16.25 3.42  4     2.31    30.19
 WestUS Private - WestUS Public     -3.75 3.42  4   -17.69    10.19
 EastUS Public - WestUS Public     -20.00 3.42  4   -33.94    -6.06