10.2.9  Model fit and lackoffit
Example  Job Satisfaction
Blue collar workers job satisfaction from large scale investigation in Denmark in 1968 (Andersen, 1985). Three variables are (1) Worker (W) job satisfaction (low, high), (2) Supervisor (S) satisfaction (low, high) and (3) Quality of management (M) (bad, good).
References: See Categorical Data Analysis Using SAS (2000), Sec. 16.3.5 and collar.sas and/or collar.R (you will need collart.txt for the R code).
Here is a table of observed and fitted values for selected models:
Manage

Super

Worker

n_{ijk}

M,S,W

MS, W

MS, MW

MS, MW, SW

bad

low

low

103

50.15

71.78

97.16

102.26

bad

low

high

87

82.59

118.22

92.84

87.74

bad

high

low

32

49.59

27.96

37.84

32.74

bad

high

high

42

81.67

46.04

36.16

41.26

good

low

low

59

85.10

63.47

51.03

59.74

good

low

high

109

140.15

104.53

116.97

108.26

good

high

low

78

84.15

105.79

85.97

77.26

good

high

high

205

138.59

174.21

197.28

205.74

Notice that as you add more parameters in your model, you get the closer fit of the expected to the observed values; this is to be expected since the saturated model will give us the perfect fit. But remember, our goal is to find the simplest possible model that explains the most variability in the data; i.e., the parsimonious model.
Let's see how we can assess the fit of each model and compare it to other models in order to render our final decision.
Chisquared goodnessoffit tests
For testing the overall fit of the model.
H_{0} : The fitted model (M_{0}) fits well
H_{A} : The saturated model (M_{A}) fits wellL_{0} = loglikelihood for the fitted model
L_{A} = loglikelihood for the saturated modelG^{2} = −2(L_{0} − L_{A})
df = df_{0} − df_{A} = # cells  # unique parameters
df = # unique parameters in M_{A}  # unique parameters in M_{0}.
Model

df

G^{2}

pvalue

X^{2}

pvalue

(M, S, W) 
4

118.00

< 0.001

128.09

< 0.001

(MS, W) 
3

35.60

< 0.001

35.72

< 0.001

(MW, S) 
3

87.79

< 0.001

85.02

< 0.001

(M, WS) 
3

102.11

< 0.001

99.09

< 0.001

(MW, SW) 
2

71.90

< 0.001

70.88

< 0.001

(MS, MW) 
2

5.39

0.07

5.41

0.07

(MS, WS) 
2

19.71

< 0.001

19.88

< 0.001

(MW, SW, MS) 
1

.065

0.80

.069

0.80

These are global tests. All the assumptions and properties about the chisquared and deviance statistics that we have learned thus far hold. Based on these values there are two reasonably well fitted models: conditional independence of S and W given M, (MS, MW), and homogeneous associations model, (MS,SW,MS).
Residuals
The residuals can be used to assess how well are the cells fitted by the model. A good model should have small residuals.
As before we can consider the Pearson residuals, adjusted residuals, and deviance residuals; we defined those in Lesson 2  recall that you can always use the "Search Engine" for the course to find certain info. For adjusted residuals, for example, if the model holds these should be close to zero.
There is a lack of fit if:
 There are a few cells (small N) and adjusted residuals are greater than 2 or
 The sample is large (large N) and adjusted residuals are greater than 3
Here is a table with observed values, fitted values for the two models of interest and their adjusted residuals.
(MS, MW)

(MS, MW, SW)


Manage

Super

Worker

n_{ijk}

\(\hat{\mu}_{ijk}\)

adj res

\(\hat{\mu}_{ijk}\)

adj res

bad

low

low

103

97.16

1.60

102.26

.25

bad

low

high

87

92.84

1.60

87.74

.25

bad

high

low

32

37.84

1.60

32.74

.25

bad

high

high

42

36.16

1.60

41.26

.25

good

low

low

59

51.03

1.69

59.74

.25

good

low

high

109

116.97

1.69

108.26

.25

good

high

low

78

85.97

1.69

77.26

.25

good

high

high

205

197.28

1.69

205.74

.25

The reported residuals are really pretty small, and both models look good. Is there additional information that could help us to be a bit more confident about our decision to accept or reject the certain model?
Recall, in SAS, OBSTATS option in MODEL statement produces various residuals, e.g., StResChi:
Observation Statistics
Observation count manager super worker Pred Xbeta Std
HessWgt Lower Upper Resraw Reschi
Resdev StResdev StReschi Reslik
1 103 bad low low 97.159091 4.5763497 0.094248
97.159091 80.771732 116.8712 5.8409091 0.5925687
0.5867754 1.585496 1.6011497 1.5990147
Recall, in R, use the function residuals(). More specifically, if you use rstandard() gives you standardized deviance residuals, and rstudent() gives you likelihood residuals, e.g., rstandard(modelCI) from collar.R:
> rstandard(modelCI)
1 2 3 4 5 6 7 8
1.585496 1.618395 1.645241 1.560709 1.645956 1.706941 1.714344 1.676040
We can, for example, compare these two models to each other via the so called partial association test.
Partial Association (PA) Tests
H_{0} : there is no partial association of two variable given the third, or
conditional (partial) odds ratios equal 1, or
twoway interaction term equal zero, e.g. \(\lambda_{jk}^{SW}=0\)
H_{A} : there is partial association of two variable given the third, or
conditional (partial) odds ratios is NOT equal 1, or
twoway interaction term is NOT equal zero, e.g. \(\lambda_{jk}^{SW} \neq 0\)
To test this, we test the difference between a model that does NOT contain the partial association of interest and a more complex model that includes this effect by looking at the likelihood ratio statistics or at the difference of deviances of the two models :
For example, let
M_{0 } the restricted model (MS, MW)
M_{1} the more complex model (MS, MW, SW)
\(\Delta G^2=G^2_0G^2_1\)
Δdf = df_{0} − df_{1}Notice the similarity of the partial association test to the previously discussed goodnessoffit test for assessing the "global" fit of the model. Here we are doing a relative comparison of two models, where is in the goodnessoffit test we always compare the fitted model to the saturated model, that is, M_{1} here is M_{A} in the goodness of fit test.
Here is a tabular summary that compares each of conditional independence models to the homogeneous association model.
Model

df

G^{2}

p

H_{0}

Δdf

ΔG^{2}

pvalue

(MW, SW, MS) 
1

.065

0.80









(MW, SW) 
2

71.90

< 0.001

\(\lambda_{ij}^{MS}=0\)

1

71.835

< 0.001

(MS, MW) 
2

5.39

0.07

\(\lambda_{jk}^{SW}=0\)

1

5.325

< 0.01

(MS, WS) 
2

19.71

< 0.001

\(\lambda_{ik}^{M W}=0\)

1

19.645

< 0.001

From the above table, we can conclude that each of the terms significantly contributes to the better fit of the model, if that term is present, thus the homogeneous association models is a better fit than any of the conditional independence models.
Here is an animated graphical representation summarizing the fits and partial tests for all models in this example:
Seems thus far that the homogeneous model describe our data the best. We typically do inference on the chosen model in terms of odds ratios.
Confidence Intervals for Odds Ratios
Recall, the odds ratios are direct estimates of loglinear parameters.
Consider estimating the partial odds ratio for SW in our example from the homogeneous association model.
Based on SAS/GENMOD (see collar.sas and collarDIndex.sas )
\(\text{log}(\hat{\theta}_{SW(i)})=0.3872\text{ and }\hat{\theta}_{SW(i)}=\text{exp}(0.3872)=1.47\)
Recall that this is a Wald statistic, thus a
(1 − α) × 100% CI for \(\text{log}(\hat{\theta}_{SW(i)})\)
\(\text{log}(\hat{\theta}_{SW(i)}) \pm z_{\alpha/2}(ASE)\)
E.g. 95% CI for \(\text{log}(\hat{\theta}_{SW(i)})\):
0.3872 ± 1.96(0.167) = (0.056, 0.709)
and 95% CI for \(\hat{\theta}_{SW(i)}\):
(exp(0.056), exp(0.709) = (1.06, 2.03)
Based on SAS/GENMOD (see collar.sas and collarDIndex.sas )
\(\text{log}(\hat{\theta}_{SW(i)})=0.3872\text{ and }\hat{\theta}_{SW(i)}=\text{exp}(0.3872)=1.47\)
Recall that this is a Wald statistic, thus a
(1 − α) × 100% CI for \(\text{log}(\hat{\theta}_{SW(i)})\)
\(\text{log}(\hat{\theta}_{SW(i)}) \pm z_{\alpha/2}(ASE)\)
E.g. 95% CI for \(\text{log}(\hat{\theta}_{SW(i)})\):
0.3872 ± 1.96(0.167) = (0.056, 0.709)
and 95% CI for \(\hat{\theta}_{SW(i)}\):
(exp(0.056), exp(0.709) = (1.06, 2.03)
But the lower bound is very close to 1. This should raise the question for you: Is this term really significant? Could we go, after all, with a simpler model (MS,MW)?
Effects of sample size and practical significance
Of the two models in our example, which one is the ”better” model? How do I choose between the simpler more parsimonious model or the one that is more complex?
Criterion

(MS, MW)

(MS, MW, SW )

Model goodness of fit G^{2} = 
5.39
df = 2, p = .07 
.065
df = 1, p = .80 
Largest adjusted residual 
1.69

.25

Likelihood ratio test of \(\lambda_{jk}^{SW}=0\) 


G^{2} = 5.325
df = 1, p = .03 
Complexity 
simpler

more complex

Do we really need the SW partial association?
It maybe a weak effect, i.e., it may not be important in a practical sense. Although it appears statistically significant which can be due to the large sample size (n = 715) for a relatively small table N = 2^{3}.
So how do we choose between these two? How do we consider the tradeoff between a simple parsimonious model (practical significance) and a more complex and closer to reality model (statistical significance). This decision is often driven by the context of the problem and the above analysis we have done. But let's see some more statistics that we can use to aid us in the decision of choosing the "best" model.