10.1 - Log-Linear Models for Two-way Tables

Overview Section

Recall, that a two-way ANOVA models the expected value of a continuous variable (e.g., plant length) depending on the levels of two categorical variables (e.g., low/high sunlight and low/high water amount). In contrast, the log-linear model expresses the cell counts (e.g., the number of plants in a cell) depending on the levels of two categorical variables.

Let \(\mu_{ij}\) be the expected counts, \(E(n_{ij})\), in an \(I \times J\) table, created by two random variables \(A\) and \(B\).

Objective

Model the cell counts: \(\mu_{ij} = n\pi_{ij}\)

Model structure

An analogous saturated log-linear model to two-way ANOVA with interaction is

\(\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_{ij}^{AB}\)

where \(i = 1,\ldots, I, j = 1, \ldots, J\), are levels of categorical random variables \(A\) and \(B\), with constraints: \(\sum_i \lambda_i = \sum_j \lambda_j = \sum_i \sum_j \lambda_{ij} = 0\), to deal with overparametrization. Overparametrization means that the number of parameters is more than what can be uniquely estimated. This model is over-parametrized because term \(\lambda_{ij}\) already has \(I \times J\) parameters corresponding to the cell means \(\mu_{ij}\). The constant, \(\lambda\), and the "main effects", \(\lambda_i\) and \(\lambda_j\) give us additional \(1 + I + J\) parameters. Superscripts denote variables \(A\) and \(B\). We will see more on this in the next sections.

Model Assumptions

The \(N = I \times J\) counts in the cells are assumed to be independent observations from a Poisson random variable, \(n_{ij}\sim \text{Poisson}(\mu_{ij})\). The log-linear modeling is natural for Poisson, Multinomial and Product-Multinomial sampling like we have discussed in earlier lectures.

Recall the Vitamin C study, a \(2 \times 2\) example from Lesson 3. Are the type of treatment and contracting cold independent? If there are associated, in which way are they associated?

We already know how to answer the above questions via the chi-square test of independence, but now we want to model the cell counts with the log-linear model of independence and ask if this model fits well.


Log-linear Models for Two-Way Tables

Given two categorical random variables, \(A\) and \(B\), there are two main types of models we will consider:

  • Independence model (A, B)
  • Saturated model (AB)

Let us start with the model of independence.