6.3.3 - Different Logistic Regression Models for Three-way Tables

In this part of the lesson we will consider different binary logistic regression models for three-way tables and their link to log-linear models. Let us return to the \(3\times2\times2\) table:

Socioeconomic status	Boy scout	Delinquent
Socioeconomic status	Boy scout	Yes	No
Low	Yes	11	43
Low	No	42	169
Medium	Yes	14	104
Medium	No	20	132
High	Yes	8	196
High	No	2	59

As we discussed in Lesson 5, there are many different models that we could fit to this table. But if we think of D as a response and B and S as potential predictors, we can focus on a subset of models that make sense.

Let \(\pi\) be the probability of delinquency. The simplest model to consider is the null or intercept-only model,

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0\) (1)

in which D is unrelated to B or S. If we were to fit this model in PROC LOGISTIC using the disaggregated data (all six lines), we would find that the\(X^2\) and \(G^2\)statistics are identical to those we obtained in Lesson 5 from testing the null hypothesis "S and B are independent of D". That is, testing the overall fit of model (1), i.e., intercept only model, is equivalent to testing the fit of the log-linear model (D, SB), because (1) says that D is unrelated to S and B but makes no assumptions about whether S and B are related.

After (1), we may want to fit the logit model that has a main effect for B,

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x_1\) (2)

where

\(x_1=1\) if B=scout,
\(x_1=0\) otherwise.

If the data are disaggregated into six lines, the goodness-of-fit tests for model (2) will be equivalent to the test for (DB, SB) log-linear model, which says that D and S are conditionally independent given B. This makes sense, because (2) says that S has no effect on D once B has been taken into account.

The model that has main effects for S,

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_2 x_2+\beta_3 x_3\) (3)

where

\(x_2=1\) if S = medium,
\(x_2=0\) otherwise,

\(x_3=1\) if S = high,
\(x_2=0\) otherwise,

says that B has no effect on D once S has been taken into account. The goodness-of-fit tests for (3) are equivalent to testing the null hypothesis that (DS, BS) fits, i.e. that D and B are conditionally independent given S.

The logit model

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3\) (4)

has main effects for both B and S, corresponds to the model of homogeneous association which we discussed in Lesson 5. We could not fit the model at that time, because the ML estimates have no closed-form solution, but we calculated CMH statistic and Breslow-Day statistic. But with logistic regression software, fitting this model is no more difficult than for any other model.

This model says that the effect of B on D, when expressed in terms of odds ratios, is identical across the levels of S. Equivalently, it says that the odds ratios describing the relationship between S and D are identical across the levels of B. If this model does not fit, we have evidence that the effect of B on D varies across the levels of S, or that the effect of S on D varies across the levels of B.

Finally, the saturated model can be written as

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3x_3+\beta_4x_1x_2+\beta_5 x_1x_3\)

which has the main effects for B and S and their interactions. This model has \(X^2=G^2=0\) with zero degrees of freedom (see scout.sas).