# Lesson 10: Log-Linear Models

### Introduction to Loglinear Models

Thus far in the course we have alluded to log-linear models several times, but have never got down to the basics of it. When we dealt with inter-relationships among several categorical variables, our focus had been on describing independence, interactions or associations between two, three or more categorical variables mostly via

- single summary statistics, and
- with significance testing.

Log-linear models go beyond a single summary statistics and specify how the cell counts depend on the levels of categorical variables. They model the association and interaction patterns among categorical variables. The log-linear modeling is natural for Poisson, Multinomial and Product-Mutlinomial sampling. They are appropriate when there is no clear distinction between response and explanatory variables, or there are more than two responses. This is a major difference between logistic models and log-linear models. In the former a response is identified, but no such special status is assigned to any variable in log-linear modelling. By default log-linear models assume discrete variables to be nominal, but these models can be adjusted to deal with ordinal and matched data. Log-linear models are more general than logit models, but some log-linear models have direct correspondence to logit models.

Consider graduate admissions at Berkeley. We may consider all possible relationships among *A* = Admission, *D* = Department and *S* = Gender. Alternatively, we may consider *A *as response and *D *and *S *as covariates in which case the possible logit models are:

- logit model for
*A*with only an intercept; - logit model for
*A*with a main effect for*D*; - logit model for
*A*with a main effect for*S*; - logit model for
*A*with a main effects for*D*and*S*; and - logit model for
*A*with main effects for*D*and*S*and the*D*×*S*interaction.

Corresponding to each of the above a log-linear model may be defined. The notations below follow those of Lesson 5.

- Model of joint independence (
*DS*,*A*), which indicates neither*D*nor*S*has an effect on*A*is equivalent to a logit model for*A*with only an intercept; - Model of conditional independence (
*DS*,*DA*), which indicates that sex has no effect on*A*after the effect of department is included, is equivalent to a logit model for*A*with a main effect for*D*; - Another conditional independence model (
*DS*,*SA*) is equivalent to a logit model for*A*with a main effect for*S*only; - Model of no three-factor interaction (
*DS*,*DA*,*SA*) indicates that the effect of sex on*A*is the same at each level of department, is equivalent to a logit model for*A*with main effects for*D*and*S*; and - Model of three-factor interaction or the saturated model (
*DSA*) indicates that the effect of sex on*A*varies across departments and is equivalent to a logit model for*A*with main effects for*D*and*S*and the*D*×*S*interaction.

“Equivalent," means that two models give equivalent goodness-of-fit statistics relative to a saturated model, and equivalent expected counts for each cell. Log-linear models are not exactly the same as logit models, because the log-linear models describe the joint distribution of all three variables, whereas the logit models describe only the conditional distribution of *A* given *D* and *S*. Log-linear models have more parameters than the logit models, but the parameters corresponding to the joint distribution of *D* and *S* are not of interest.

In general, to construct a log-linear model that is equivalent to a logit model, we need to include all possible associations among the predictors. In the Berkeley example, we need to include *DS* in every model. This lesson will walk-through examples how this is done in both SAS and R.

In subsequent sections we look at the log-linear models in more detail. The two great advantages of log-linear models are that they are flexible and they are interpretable. Log-linear models have all the flexibility associated with ANOVA and regression. We have mentioned before that log-linear models are also another form of GLM. They also have natural interpretations in terms of odds and frequently have interpretations in terms of independence, as we have shown above.