# 10.2.9 - Model fit and lack-of-fit Printer-friendly version

### Example - Job Satisfaction

Blue collar workers job satisfaction from large scale investigation in Denmark in 1968 (Andersen, 1985). Three variables are (1) Worker (W) job satisfaction (low, high), (2) Supervisor (S) satisfaction (low, high) and (3) Quality of management (M) (bad, good).

References: See Categorical Data Analysis Using SAS (2000), Sec. 16.3.5 and collar.sas and/or collar.R (you will need collart.txt for the R code).

Here is a table of observed and fitted values for selected models:

 Manage Super Worker nijk M,S,W MS, W MS, MW MS, MW, SW bad low low 103 50.15 71.78 97.16 102.26 bad low high 87 82.59 118.22 92.84 87.74 bad high low 32 49.59 27.96 37.84 32.74 bad high high 42 81.67 46.04 36.16 41.26 good low low 59 85.10 63.47 51.03 59.74 good low high 109 140.15 104.53 116.97 108.26 good high low 78 84.15 105.79 85.97 77.26 good high high 205 138.59 174.21 197.28 205.74

Notice that as you add more parameters in your model, you get the closer fit of the expected to the observed values; this is to be expected since the saturated model will give us the perfect fit. But remember, our goal is to find the simplest possible model that explains the most variability in the data; i.e., the parsimonious model.

Let's see how we can assess the fit of each model and compare it to other models in order to render our final decision.

#### Chi-squared goodness-of-fit tests

For testing the overall fit of the model.

H0 : The fitted model (M0) fits well
HA : The saturated model (MA) fits well

L0 = loglikelihood for the fitted model
LA = loglikelihood for the saturated model

G2 = −2(L0LA)
df = df0dfA = # cells - # unique parameters
df = # unique parameters in MA - # unique parameters in M0.

 Model df G2 p-value X2 p-value (M, S, W) 4 118.00 < 0.001 128.09 < 0.001 (MS, W) 3 35.60 < 0.001 35.72 < 0.001 (MW, S) 3 87.79 < 0.001 85.02 < 0.001 (M, WS) 3 102.11 < 0.001 99.09 < 0.001 (MW, SW) 2 71.90 < 0.001 70.88 < 0.001 (MS, MW) 2 5.39 0.07 5.41 0.07 (MS, WS) 2 19.71 < 0.001 19.88 < 0.001 (MW, SW, MS) 1 .065 0.80 .069 0.80

These are global tests. All the assumptions and properties about the chi-squared and deviance statistics that we have learned thus far hold. Based on these values there are two reasonably well fitted models: conditional independence of S and W given M, (MS, MW), and homogeneous associations model, (MS,SW,MS).

#### Residuals

The residuals can be used to assess how well are the cells fitted by the model. A good model should have small residuals.

As before we can consider the Pearson residuals, adjusted residuals, and deviance residuals; we defined those in Lesson 2 -- recall that you can always use the "Search Engine" for the course to find certain info. For adjusted residuals, for example, if the model holds these should be close to zero.

There is a lack of fit if:

1. There are a few cells (small N) and adjusted residuals are greater than 2 or
2. The sample is large (large N) and adjusted residuals are greater than 3

Here is a table with observed values, fitted values for the two models of interest and their adjusted residuals.

 (MS, MW) (MS, MW, SW) Manage Super Worker nijk $\hat{\mu}_{ijk}$ adj res $\hat{\mu}_{ijk}$ adj res bad low low 103 97.16 1.60 102.26 .25 bad low high 87 92.84 -1.60 87.74 -.25 bad high low 32 37.84 -1.60 32.74 -.25 bad high high 42 36.16 1.60 41.26 .25 good low low 59 51.03 1.69 59.74 -.25 good low high 109 116.97 -1.69 108.26 .25 good high low 78 85.97 1.69 77.26 .25 good high high 205 197.28 -1.69 205.74 -.25

The reported residuals are really pretty small, and both models look good. Is there additional information that could help us to be a bit more confident about our decision to accept or reject the certain model? Recall, in SAS, OBSTATS option in MODEL statement produces various residuals, e.g., StResChi:

                             Observation StatisticsObservation      count  manager  super  worker       Pred      Xbeta        Std                          HessWgt      Lower      Upper     Resraw     Reschi                           Resdev   StResdev   StReschi     Reslik          1        103  bad      low    low     97.159091  4.5763497   0.094248                        97.159091  80.771732   116.8712  5.8409091  0.5925687                        0.5867754   1.585496  1.6011497  1.5990147 Recall, in R, use the function residuals(). More specifically, if you use rstandard() gives you standardized deviance residuals, and rstudent() gives you likelihood residuals, e.g., rstandard(modelCI) from collar.R:

> rstandard(modelCI)        1         2         3         4         5         6         7         8  1.585496 -1.618395 -1.645241  1.560709  1.645956 -1.706941 -1.714344  1.676040

We can, for example, compare these two models to each other via the so called partial association test.

#### Partial Association (PA) Tests

H0 :   there is no partial association of two variable given the third, or

conditional (partial) odds ratios equal 1, or

two-way interaction term equal zero, e.g. $\lambda_{jk}^{SW}=0$

HA :   there is partial association of two variable given the third, or

conditional (partial) odds ratios is NOT equal 1, or

two-way interaction term is NOT equal zero, e.g. $\lambda_{jk}^{SW} \neq 0$

To test this, we test the difference between a model that does NOT contain the partial association of interest and a more complex model that includes this effect by looking at the likelihood ratio statistics or at the difference of deviances of the two models :

For example, let

M0 the restricted model (MS, MW)
M1 the more complex model (MS, MW, SW)
$\Delta G^2=G^2_0-G^2_1$
Δdf = df0 df1

Notice the similarity of the partial association test to the previously discussed goodness-of-fit test for assessing the "global" fit of the model. Here we are doing a relative comparison of two models, where is in the goodness-of-fit test we always compare the fitted model to the saturated model, that is, M1 here is MA in the goodness of fit test.

Here is a tabular summary that compares each of conditional independence models to the homogeneous association model.

 Model df G2 p H0 Δdf ΔG2 p-value (MW, SW, MS) 1 .065 0.80 - - - - (MW, SW) 2 71.90 < 0.001 $\lambda_{ij}^{MS}=0$ 1 71.835 < 0.001 (MS, MW) 2 5.39 0.07 $\lambda_{jk}^{SW}=0$ 1 5.325 < 0.01 (MS, WS) 2 19.71 < 0.001 $\lambda_{ik}^{M W}=0$ 1 19.645 < 0.001

From the above table, we can conclude that each of the terms significantly contributes to the better fit of the model, if that term is present, thus the homogeneous association models is a better fit than any of the conditional independence models.

Here is an animated graphical representation summarizing the fits and partial tests for all models in this example:

Seems thus far that the homogeneous model describe our data the best. We typically do inference on the chosen model in terms of odds ratios.

#### Confidence Intervals for Odds Ratios

Recall, the odds ratios are direct estimates of log-linear parameters.

Consider estimating the partial odds ratio for SW in our example from the homogeneous association model. Based on SAS/GENMOD (see collar.sas and collarDIndex.sas )

$\text{log}(\hat{\theta}_{SW(i)})=0.3872\text{ and }\hat{\theta}_{SW(i)}=\text{exp}(0.3872)=1.47$

Recall that this is a Wald statistic, thus a

(1 − α) × 100% CI for $\text{log}(\hat{\theta}_{SW(i)})$

$\text{log}(\hat{\theta}_{SW(i)}) \pm z_{\alpha/2}(ASE)$

E.g. 95% CI for $\text{log}(\hat{\theta}_{SW(i)})$:

0.3872 ± 1.96(0.167) = (0.056, 0.709)

and 95% CI for $\hat{\theta}_{SW(i)}$:

(exp(0.056), exp(0.709) = (1.06, 2.03) Based on SAS/GENMOD (see collar.sas and collarDIndex.sas )

$\text{log}(\hat{\theta}_{SW(i)})=0.3872\text{ and }\hat{\theta}_{SW(i)}=\text{exp}(0.3872)=1.47$

Recall that this is a Wald statistic, thus a

(1 − α) × 100% CI for $\text{log}(\hat{\theta}_{SW(i)})$

$\text{log}(\hat{\theta}_{SW(i)}) \pm z_{\alpha/2}(ASE)$

E.g. 95% CI for $\text{log}(\hat{\theta}_{SW(i)})$:

0.3872 ± 1.96(0.167) = (0.056, 0.709)

and 95% CI for $\hat{\theta}_{SW(i)}$:

(exp(0.056), exp(0.709) = (1.06, 2.03)

But the lower bound is very close to 1. This should raise the question for you: Is this term really significant? Could we go, after all, with a simpler model (MS,MW)?

#### Effects of sample size and practical significance

Of the two models in our example, which one is the ”better” model? How do I choose between the simpler more parsimonious model or the one that is more complex?

 Criterion (MS, MW) (MS, MW, SW ) Model goodness of fit G2 = 5.39 df = 2, p = .07 .065 df = 1, p = .80 Largest adjusted residual 1.69 .25 Likelihood ratio test of $\lambda_{jk}^{SW}=0$ - G2 = 5.325 df = 1, p = .03 Complexity simpler more complex

Do we really need the SW partial association?

It maybe a weak effect, i.e., it may not be important in a practical sense. Although it appears statistically significant which can be due to the large sample size (n = 715) for a relatively small table N = 23.

So how do we choose between these two? How do we consider the trade-off between a simple parsimonious model (practical significance) and a more complex and closer to reality model (statistical significance). This decision is often driven by the context of the problem and the above analysis we have done. But let's see some more statistics that we can use to aid us in the decision of choosing the "best" model.