5.5 - Nested Model: Exercise Example

Here is the SAS code to run the ANOVA model for the hours of exercise for high school students example discussed in lesson 5.2:

data Nested_Example_data;
infile datalines delimiter=','; 
input Region $ City $ ExHours;
datalines;
NE,NY,30
NE,NY,35
NE,Pittsburgh,18
NE,Pittsburgh,20
MW,Chicago,10
MW,Chicago,9
MW,Detroit,20
MW,Detroit,22
W,LA,18
W,LA,19
W,Seattle,4
W,Seattle,6
; 
/*to run the nested ANOVA model*/
proc mixed data=Nested_Example_data method=type3;
class Region City;
model ExHours = Region City(Region);
store nested1;
run;
/*to obtain the resulting multiple comparison results*/
ods graphics on;
proc plm restore=nested1;
lsmeans Region / adjust=tukey plot=meanplot cl lines;
lsmeans City(Region) / adjust=tukey plot=meanplot cl lines;
run;

When we run this SAS program, here is the output that we are interested in:

Type 3 Analysis of Variance
Source DF Sum of Squares Mean Square Expected Mean Square Error Term Error DF F Value Pr > F
Region 2 424.666667 212.333333 Var(Residual)+Q(Region, City(Region)) MS(Residual) 6 65.33 <.0001
City(Region) 3 496.750000 165.583333 Var(Residual)+Q(City(Region)) MS(Residual) 6 50.95 0.0001
Residual 6 19.500000 3.250000 Var(Residual)        
Type 3 Test of Fixed Effects
Effect Num DF Den DF F Value Pr>F
Region 2 6 65.33 <.0001
City(Region) 3 6 50.95 0.0001

The p-values above indicate that both Region and City(Region) are statistically significant. The plots and charts below obtained from the Tukey option specify the means which are significantly different.

Figure 5.2.a: Mean hours of exercise by Region with 95% CIs
Plot of ExHours least-squares means for Region. With 95% confidence limits. 10 15 20 25 ExHours LS-Mean MW NE W Region LS-Means for Region With 95% Confidence Limits
Figure 5.2.b: Diffogram for Mean Comparisons by Region
Plot of all pairwise ExHours least-squares means differences for Region with Tukey adjustment at significance level 0.05. ExHours Comparisons for Region Significant Not significant Differences for alpha=0.05 (Tukey Adjustment) MW NE W MW NE W 10 15 20 25 10 15 20 25
 
Figure 5.2.c: Mean hours of exercise by City(Region) with 95% CIs
Plot of ExHours least-squares means for City(Region). With 95% confidence limits. 0 10 20 30 ExHours LS-Mean Chicago MW Detroit MW NY NE Pittsbur NE LA W Seattle W City(Region) LS-Means for City(Region) With 95% Confidence Limits
Figure 5.2.d: Diffogram for Mean Comparisons by City(Region)
Plot of all pairwise ExHours least-squares means differences for City(Region) with Tukey adjustment at significance level 0.05. ExHours Comparisons for City(Region) Significant Not significant Differences for alpha=0.05 (Tukey Adjustment) Chicago MW Detroit MW NY NE Pittsbur NE LA W Seattle W Chicago MW Detroit MW NY NE Pittsbur NE LA W Seattle W 0 10 20 30 0 10 20 30

The exercise hours on average are statistically higher in the northeastern region compared to the midwest and the west, while the average exercise hours of these two regions are not significantly different.

Also, the comparison of the means between cities indicates that the high schoolers in New York City exercise significantly more than the other cities in the study. The exercise levels are similar among Detroit, Pittsburgh, and LA, while the exercise levels of high schoolers in Chicago and Seattle are similar but significantly lower than all other cities in the study.

These grouping observations are further confirmed by the line plots below.

Figure 5.2.e: Line plot for multiple comparisons of means for Regions.
NE MW W 25.7500 15.2500 11.7500 Region Estimate ExHours Tukey Grouping for LS-Means of Region (Alpha = 0.05) LS-means covered by the same bar are not significantly different.
Figure 5.2.f: Line plot for multiple comparisons of means for Cities.
NY Detroit Pittsbur LA Chicago Seattle NE MW NE W MW W 32.5000 21.0000 19.0000 18.5000 9.5000 5.0000 City Region Estimate ExHours Tukey Grouping for LS-Means of City (Region) (Alpha = 0.05) LS-means covered by the same bar are not significantly different.

In Minitab, for the following (Nested Example Data):

Stat > ANOVA > General Linear Model > Fit General Linear Model

Enter the factors, 'Region' and 'City' in the Factors box, then click on Random/Nest...Here is where we specify the nested effect of City in Region.

 

Minitab General Linear Model window with C1 city chosen

 

Minitab general linear model: random nest window with C2 Region selected

The output is shown below.

Factor Information

General Linear Model: response versus School, Instructor

Factor Type Levels Values
Region Fixed 3 MW, NE, W
City(Region) Fixed 6

Chicago(MW), Detroit(MW), NY (NE), Pittsburgh(NE), LA(W). Seattle(W)

Analysis of Variance

Source D Adj SS Adj MS F-Value P-Value
Region 2 424.67 212.333 65.33 0.000
City(Region) 3 496.75 165.583 50.95 0.000
Error 6 19.50 3.250    
Total 11 940.92      

Model Summary

S R-sq R-sq(adj) R-sq(pred)
1.80278 97.93% 96.20% 91.71%

Following the ANOVA run, you can generate the mean comparisons by

Stat > ANOVA > General Linear Model > Comparisons

Then specify “Region” and “City(Region)” for the comparisons by checking the boxes.

Minitab Comparisons window with Tukey method checked and terms region and city(region) checked

Then choose Graphs to get the following dialog box, where ‘Interval plot for the difference of means’ should be checked.

Minitab Comparisons: Graphs Window with Interval plot for differences of means checked

The outputs are as follows.

Tukey Pairwise Comparisons: Region

Grouping Information Using Tukey Method and 95% Confidence
Region N Mean Grouping
NE 4 25.75 A
MW 4 15.25        B
W 4 11.75        B

Means that do not share a letter are significantly different.

Minitab Tukey Simultaneous 95% CIs Differences of Means for Ex_Hours graph

Tukey Pairwise Comparisons: (City)Region

Grouping Information Using Tukey Method and 95% Confidence

City(Region) N Mean Grouping
NY(NE) 2 32.5 A      
Detroit(MW) 2 21.0   B    
Pittsburgh(NE) 2 19.0   B    
LA(W) 2 18.5   B    
Chicago(MW) 2 9.5     C  
Seattle(W) 2 5.0     C  

Means that do not share a letter are significantly different.

Minitab Tukey Simultaneous 95% CIs Differences of Means for Ex_hours graph

Note: Nested models in R are a bit temperamental. If we run the model with nested (city) labels which are unique to each nesting factor (region), the resulting model will contain ALL interaction effects with NAs for those not included in the nested model. To avoid this, we can create a new variable in the dataset which labels the nested factors the same within each nesting factor. See below for an example.

First, load and attach the exercise hours data in R. We create a city code for each of the cities within each region. The data is below printed to illustrate the new variable.

setwd("~/path-to-folder/")
exhours_data<-read.table("exhours_data.txt",header=T,sep ="\t")
exhours_data$CityCode = as.factor(rep(c(1,1,2,2),3))
attach(exhours_data)
exhours_data
   Region       City Ex_hours CityCode
1      NE         NY       30        1
2      NE         NY       35        1
3      NE Pittsburgh       18        2
4      NE Pittsburgh       20        2
5      MW    Chicago       10        1
6      MW    Chicago        9        1
7      MW    Detroit       20        2
8      MW    Detroit       22        2
9       W         LA       18        1
10      W         LA       19        1
11      W    Seattle        4        2
12      W    Seattle        6        2

We run the model with the coded nested variable and obtain the ANOVA table.

options(contrasts=c("contr.sum","contr.poly"))
lm1 = lm(Ex_hours ~ Region/CityCode) # equivalent to lm1 = lm(Ex_hours ~ Region + Region:CityCode)
aov3 = car::Anova(lm1, type=3) 
aov3
Anova Table (Type III tests)

Response: Ex_hours
                Sum Sq Df  F value    Pr(>F)     
(Intercept)     3710.1  1 1141.564 4.475e-08 ***
Region           424.7  2   65.333 8.462e-05 ***
Region:CityCode  496.7  3   50.949 0.0001162 ***
Residuals         19.5  6                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can obtain the Tukey comparison of the regions as follows.

aov1 = aov(lm1)
tpairs_Region = TukeyHSD(aov1,"Region")
tpairs_Region
plot(tpairs_Region)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lm1)
$Region
       diff        lwr         upr     p adj
NE-MW  10.5   6.588702  14.4112978 0.0004243
W-MW   -3.5  -7.411298   0.4112978 0.0747598
W-NE  -14.0 -17.911298 -10.0887022 0.0000836
"Tukey multiple comparision of the regions"

If we were to use lm1 to obtain the Tukey comparison of the cities, our results would be labeled with the city codes, which is not easily interpreted. To obtain results with the city names, we will rerun the model with the variable City and omit the NAs from the Tukey results before displaying them.

lm2 = lm(Ex_hours ~ Region/City)
aov2 = aov(lm2)
tpairs_City = TukeyHSD(aov2,"Region:City")
tpairs_City$`Region:City` = na.omit(tpairs_City$`Region:City`)
tpairs_City
par(las = 1, mar=c(4.5,12,2,0)); plot(tpairs_City)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lm2)
$`Region:City`
                          diff          lwr        upr     p adj
MW:Detroit-MW:Chicago     11.5   2.03420257  20.965797 0.0198222
W:LA-MW:Chicago            9.0  -0.46579743  18.465797 0.0626471
NE:NY-MW:Chicago          23.0  13.53420257  32.465797 0.0004610
NE:Pittsburgh-MW:Chicago   9.5   0.03420257  18.965797 0.0491884
W:Seattle-MW:Chicago      -4.5 -13.96579743   4.965797 0.5867601
W:LA-MW:Detroit           -2.5 -11.96579743   6.965797 0.9752059
NE:NY-MW:Detroit          11.5   2.03420257  20.965797 0.0198222
NE:Pittsburgh-MW:Detroit  -2.0 -11.46579743   7.465797 0.9960158
W:Seattle-MW:Detroit     -16.0 -25.46579743  -6.534203 0.0035460
NE:NY-W:LA                14.0   4.53420257  23.465797 0.0072412
NE:Pittsburgh-W:LA         0.5  -8.96579743   9.965797 1.0000000
W:Seattle-W:LA           -13.5 -22.96579743  -4.034203 0.0087623
NE:Pittsburgh-NE:NY      -13.5 -22.96579743  -4.034203 0.0087623
W:Seattle-NE:NY          -27.5 -36.96579743 -18.034203 0.0001762
W:Seattle-NE:Pittsburgh  -14.0 -23.46579743  -4.534203 0.0072412
means plot for exhours vs region