In this example, we could have also arranged the input data like this:
S | B | \(y_i\) | \(n_i\) |
---|---|---|---|
low | scout 11 | 54 | |
low | non-scout | 42 | 211 |
medium | scout | 14 | 118 |
medium | non-scout | 20 | 152 |
high | scout | 8 | 204 |
high | non-scout | 2 | 61 |
A SAS program for fitting the same model is shown below.
data new;
input S $ B $ y n;
cards;
low scout 11 54
low nonscout 42 211
medium scout 14 118
medium nonscout 20 152
high scout 8 204
high nonscout 2 61
;
proc logist data=new;
class S / order=data param=ref ref=first;
model y/n = S / scale=none;
run;
The parameter estimates from this new program are exactly the same as before:
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
Wald Chi-Square |
Pr > ChiSq | |
Intercept | 1 | -1.3863 | 0.1536 | 81.4848 | <.0001 | |
S | medium | 1 | -0.5512 | 0.2392 | 5.3080 | 0.0212 |
S | high | 1 | -1.8524 | 0.3571 | 26.9110 | <.0001 |
But the overall fit statistics are different! Before, we had \(X^2=0\) and \(G^2=0\) because the model was saturated (there were three parameters and \(N = 3\) lines of data). But now, the fit statistics are:
Deviance and Pearson Goodness-of-Fit Statistics | ||||
---|---|---|---|---|
Criterion | Value | DF | Value/DF | Pr > ChiSq |
Deviance | 0.1623 | 3 | 0.0541 | 0.9834 |
Pearson | 0.1602 | 3 | 0.0534 | 0.9837 |
Number of events/trials observations: 6
The model appears to fit very well, but it is no longer saturated. What happened? Recall that \(X^2\) and \(G^2\) are testing the null hypothesis that the current model is correct, versus the alternative of a saturated model. When we disaggregated the data by levels of B, using six input lines rather than three, the current model did not change but the saturated model did; the saturated model was enlarged to six parameters.
It is very important for you to understand how you entered the data and what model you are fitting. If you understand the basic concepts, then you can apply model comparisons with any statistical software application.
Another way to interpret the overall \(X^2\) and \(G^2\) goodness-of-fit tests is that they are testing the significance of all omitted covariates. If we collapse the data over B and use only three lines of data, then SAS is unaware of the existence of B. But if we disaggregate the data by levels of B and do not include it in the model, then SAS has the opportunity to test the fit of the current model—in which the probability of delinquency varies by S alone—against the saturated alternative in which the probability of delinquency varies by each combination of the levels of S and B. When the data are disaggregated, the goodness-of-fit tests are actually testing the hypothesis that D is unrelated to B once S has been taken into account—i.e., that D and B are conditionally independent given S.
Here’s another way to think about it. The current model has three parameters:
- an intercept, and
- two indicators for S.
But the alternative has six parameters. We can think of these six parameters as an intercept and five dummies to distinguish among the six rows of data, but then it's not easy to see how the current model becomes a special case of it. So, we think of the six parameters as
- an intercept,
- two dummies for S,
- one dummy for B, and
- two interaction terms for SB.
Now it has become clear that the current model is a special case of this model in which the coefficients for B and the SB interactions are zero. The overall \(X^2\) and \(G^2\) statistics for the disaggregated data are testing the joint significance of the B dummy and the SB interactions.
So, should we aggregate, or should we not? If the current model is true, then it doesn’t matter; we get exactly the same estimated coefficients and standard errors either way. But dis-aggregating gives us the opportunity to test the significance of the omitted terms for B and SB.
Therefore, it often makes sense to dis-aggregate your dataset by variables that are not included in the model, because it gives you the opportunity to test the overall fit of your model. But that strategy has limits. If you dis-aggregate the data too much, the \(n_i\)s may become too small to reliably test the fit by \(X^2\) and \(G^2\).