5.5 - Nested Model: Exercise Example

Here is the SAS code to run the ANOVA model for the hours of exercise for high school students example discussed in lesson 5.2:

data Nested_Example_data;
infile datalines delimiter=','; 
input Region $ City $ ExHours;
datalines;
NE,NY,30
NE,NY,35
NE,Pittsburgh,18
NE,Pittsburgh,20
MW,Chicago,10
MW,Chicago,9
MW,Detroit,20
MW,Detroit,22
W,LA,18
W,LA,19
W,Seattle,4
W,Seattle,6
; 
/*to run the nested ANOVA model*/
proc mixed data=Nested_Example_data method=type3;
class Region City;
model ExHours = Region City(Region);
store nested1;
run;
/*to obtain the resulting multiple comparison results*/
ods graphics on;
proc plm restore=nested1;
lsmeans Region / adjust=tukey plot=meanplot cl lines;
lsmeans City(Region) / adjust=tukey plot=meanplot cl lines;
run;

When we run this SAS program, here is the output that we are interested in:

Type 3 Analysis of Variance
Source	DF	Sum of Squares	Mean Square	Expected Mean Square	Error Term	Error DF	F Value	Pr > F
Region	2	424.666667	212.333333	Var(Residual)+Q(Region, City(Region))	MS(Residual)	6	65.33	<.0001
City(Region)	3	496.750000	165.583333	Var(Residual)+Q(City(Region))	MS(Residual)	6	50.95	0.0001
Residual	6	19.500000	3.250000	Var(Residual)

Type 3 Test of Fixed Effects
Effect	Num DF	Den DF	F Value	Pr>F
Region	2	6	65.33	<.0001
City(Region)	3	6	50.95	0.0001

The p-values above indicate that both Region and City(Region) are statistically significant. The plots and charts below obtained from the Tukey option specify the means which are significantly different.

Figure 5.2.a: Mean hours of exercise by Region with 95% CIs

Figure 5.2.b: Diffogram for Mean Comparisons by Region

Figure 5.2.c: Mean hours of exercise by City(Region) with 95% CIs

Figure 5.2.d: Diffogram for Mean Comparisons by City(Region)

The exercise hours on average are statistically higher in the northeastern region compared to the midwest and the west, while the average exercise hours of these two regions are not significantly different.

Also, the comparison of the means between cities indicates that the high schoolers in New York City exercise significantly more than the other cities in the study. The exercise levels are similar among Detroit, Pittsburgh, and LA, while the exercise levels of high schoolers in Chicago and Seattle are similar but significantly lower than all other cities in the study.

These grouping observations are further confirmed by the line plots below.

Figure 5.2.e: Line plot for multiple comparisons of means for Regions.

Figure 5.2.f: Line plot for multiple comparisons of means for Cities.

In Minitab, for the following (Nested Example Data):

Stat > ANOVA > General Linear Model > Fit General Linear Model

Enter the factors, 'Region' and 'City' in the Factors box, then click on Random/Nest...Here is where we specify the nested effect of City in Region.

Minitab General Linear Model window with C1 city chosen

Minitab general linear model: random nest window with C2 Region selected

The output is shown below.

Factor Information

General Linear Model: response versus School, Instructor

Factor	Type	Levels	Values
Region	Fixed	3	MW, NE, W
City(Region)	Fixed	6	Chicago(MW), Detroit(MW), NY (NE), Pittsburgh(NE), LA(W). Seattle(W)

Analysis of Variance

Source	D	Adj SS	Adj MS	F-Value	P-Value
Region	2	424.67	212.333	65.33	0.000
City(Region)	3	496.75	165.583	50.95	0.000
Error	6	19.50	3.250
Total	11	940.92

Model Summary

S	R-sq	R-sq(adj)	R-sq(pred)
1.80278	97.93%	96.20%	91.71%

Following the ANOVA run, you can generate the mean comparisons by

Stat > ANOVA > General Linear Model > Comparisons

Then specify “Region” and “City(Region)” for the comparisons by checking the boxes.

Minitab Comparisons window with Tukey method checked and terms region and city(region) checked

Then choose Graphs to get the following dialog box, where ‘Interval plot for the difference of means’ should be checked.

Minitab Comparisons: Graphs Window with Interval plot for differences of means checked

The outputs are as follows.

Tukey Pairwise Comparisons: Region

Grouping Information Using Tukey Method and 95% Confidence

Region	N	Mean	Grouping
NE	4	25.75	A
MW	4	15.25	B
W	4	11.75	B

Means that do not share a letter are significantly different.

Minitab Tukey Simultaneous 95% CIs Differences of Means for Ex_Hours graph

Tukey Pairwise Comparisons: (City)Region

Grouping Information Using Tukey Method and 95% Confidence

City(Region)	N	Mean	Grouping
NY(NE)	2	32.5	A
Detroit(MW)	2	21.0		B
Pittsburgh(NE)	2	19.0		B
LA(W)	2	18.5		B
Chicago(MW)	2	9.5			C
Seattle(W)	2	5.0			C

Means that do not share a letter are significantly different.

Note: Nested models in R are a bit temperamental. If we run the model with nested (city) labels which are unique to each nesting factor (region), the resulting model will contain ALL interaction effects with NAs for those not included in the nested model. To avoid this, we can create a new variable in the dataset which labels the nested factors the same within each nesting factor. See below for an example.

First, load and attach the exercise hours data in R. We create a city code for each of the cities within each region. The data is below printed to illustrate the new variable.

setwd("~/path-to-folder/")
exhours_data<-read.table("exhours_data.txt",header=T,sep ="\t")
exhours_data$CityCode = as.factor(rep(c(1,1,2,2),3))
attach(exhours_data)
exhours_data

   Region       City Ex_hours CityCode
1      NE         NY       30        1
2      NE         NY       35        1
3      NE Pittsburgh       18        2
4      NE Pittsburgh       20        2
5      MW    Chicago       10        1
6      MW    Chicago        9        1
7      MW    Detroit       20        2
8      MW    Detroit       22        2
9       W         LA       18        1
10      W         LA       19        1
11      W    Seattle        4        2
12      W    Seattle        6        2

We run the model with the coded nested variable and obtain the ANOVA table.

options(contrasts=c("contr.sum","contr.poly"))
lm1 = lm(Ex_hours ~ Region/CityCode) # equivalent to lm1 = lm(Ex_hours ~ Region + Region:CityCode)
aov3 = car::Anova(lm1, type=3) 
aov3

Anova Table (Type III tests)

Response: Ex_hours
                Sum Sq Df  F value    Pr(>F)     
(Intercept)     3710.1  1 1141.564 4.475e-08 ***
Region           424.7  2   65.333 8.462e-05 ***
Region:CityCode  496.7  3   50.949 0.0001162 ***
Residuals         19.5  6                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can obtain the Tukey comparison of the regions as follows.

aov1 = aov(lm1)
tpairs_Region = TukeyHSD(aov1,"Region")
tpairs_Region
plot(tpairs_Region)

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lm1)
$Region
       diff        lwr         upr     p adj
NE-MW  10.5   6.588702  14.4112978 0.0004243
W-MW   -3.5  -7.411298   0.4112978 0.0747598
W-NE  -14.0 -17.911298 -10.0887022 0.0000836

"Tukey multiple comparision of the regions"

If we were to use lm1 to obtain the Tukey comparison of the cities, our results would be labeled with the city codes, which is not easily interpreted. To obtain results with the city names, we will rerun the model with the variable City and omit the NAs from the Tukey results before displaying them.

lm2 = lm(Ex_hours ~ Region/City)
aov2 = aov(lm2)
tpairs_City = TukeyHSD(aov2,"Region:City")
tpairs_City$`Region:City` = na.omit(tpairs_City$`Region:City`)
tpairs_City
par(las = 1, mar=c(4.5,12,2,0)); plot(tpairs_City)

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lm2)
$`Region:City`
                          diff          lwr        upr     p adj
MW:Detroit-MW:Chicago     11.5   2.03420257  20.965797 0.0198222
W:LA-MW:Chicago            9.0  -0.46579743  18.465797 0.0626471
NE:NY-MW:Chicago          23.0  13.53420257  32.465797 0.0004610
NE:Pittsburgh-MW:Chicago   9.5   0.03420257  18.965797 0.0491884
W:Seattle-MW:Chicago      -4.5 -13.96579743   4.965797 0.5867601
W:LA-MW:Detroit           -2.5 -11.96579743   6.965797 0.9752059
NE:NY-MW:Detroit          11.5   2.03420257  20.965797 0.0198222
NE:Pittsburgh-MW:Detroit  -2.0 -11.46579743   7.465797 0.9960158
W:Seattle-MW:Detroit     -16.0 -25.46579743  -6.534203 0.0035460
NE:NY-W:LA                14.0   4.53420257  23.465797 0.0072412
NE:Pittsburgh-W:LA         0.5  -8.96579743   9.965797 1.0000000
W:Seattle-W:LA           -13.5 -22.96579743  -4.034203 0.0087623
NE:Pittsburgh-NE:NY      -13.5 -22.96579743  -4.034203 0.0087623
W:Seattle-NE:NY          -27.5 -36.96579743 -18.034203 0.0001762
W:Seattle-NE:Pittsburgh  -14.0 -23.46579743  -4.534203 0.0072412