10: Log-Linear Models

Overview Section

Thus far in the course, we have alluded to log-linear models several times but have never got down to the specifics of them. When we dealt with inter-relationships among several categorical variables, our focus was mostly on describing their associations via

single summary statistics and
significance testing.

Log-linear models go beyond single summary statistics and specify how the cell counts depend on the levels of categorical variables. They model the association and interaction patterns among categorical variables. The log-linear model is natural for Poisson, Multinomial and Product-Multinomial sampling. They are appropriate when there is no clear distinction between response and explanatory variables or when there are more than two responses. This is a fundamental difference between logistic models and log-linear models. In the former, a response is identified, but no such special status is assigned to any variable in log-linear modeling. By default, log-linear models assume discrete variables to be nominal, but these models can be adjusted to deal with ordinal and matched data. Log-linear models are more general than logit models, but some log-linear models have direct correspondence to logit models.

Consider the Berkeley admission example. We may consider all possible relationships among A = Admission, D = Department and S = Sex. Alternatively, we may consider A as response and D and S as covariates, in which case the possible logit models are

logit model for A with only an intercept;
logit model for A with a main effect for D;
logit model for A with a main effect for S;
logit model for A with a main effects for D and S; and
logit model for A with main effects for D and S and the D \(\times\) S interaction.

Corresponding to each of the above, a log-linear model may be defined. The notations below follow those of Lesson 5.

Model of joint independence (DS, A), which indicates neither D nor S has an effect on A is equivalent to a logit model for response A with only an intercept;
Model of conditional independence (DS, DA), which indicates that S has no effect on A after the effect of D is included, is equivalent to a logit model for response A with the single predictor D; another conditional independence model (DS, SA) is equivalent to a logit model for response A with the single predictor S;
Model of homogeneous association (DS, DA, SA) indicates that the effect of S on A is the same at each level of department, which is equivalent to a logit model for response A with predictors D and S but no interaction; and
Saturated or unrestricted model (DSA) indicates that the effect of S on A varies across D and is equivalent to a logit model for response A with predictors D and S as well as the D by S interaction.

"Equivalent" means that two models give equivalent goodness-of-fit statistics relative to a saturated model, and equivalent expected counts for each cell. Log-linear models are not the same as logit models, because the log-linear models describe the joint distribution of all three variables, whereas the logit models describe only the conditional distribution of A given D and S. Log-linear models have more parameters than the logit models, but the parameters corresponding to the joint distribution of D and S are not of interest.

In general, to construct a log-linear model that is equivalent to a logit model, we need to include all possible associations among the predictors. In the Berkeley example, we need to include DS in every model. This lesson will walk through examples of how this is done in both SAS and R.

In subsequent sections, we look at the log-linear models in more detail. The two great advantages of log-linear models are that they are flexible and they are interpretable. Log-linear models have all the flexibility associated with ANOVA and regression. We have mentioned before that log-linear models are also another form of GLM. They also have natural interpretations in terms of odds and frequently have interpretations in terms of independence, as we have shown above.

Objectives

Upon completion of this lesson, you should be able to:

Objective 10.1: Interpret the parameters of the log-linear model and how they can be used to explain joint and conditional associations among variables.
Objective 10.2: Fit and use the log-linear model to explain joint and conditional associations among variables.
Objective 10.3: Interpret interactions of multiple variables in the context of the log-linear model.

10: Log-Linear Models

Overview Section

Lesson 10 Code Files

R Files

SAS Files