Recall, that the independence can be stated in terms of cell probabilities as a product of marginal probabilities,

\(\pi_{ij}=\pi_{i+}\pi_{+j}\quad i = 1, \ldots, I, j = 1, \ldots, J\)

and in terms of cell frequencies,

\(\mu_{ij}=n\pi_{ij}=n\pi_{i+}\pi_{+j}\quad i = 1, \ldots, I, j = 1, \ldots, J\)

By taking natural logarithms on both sides of "=" sign, we obtain the loglinear model of independence:

\(\log(\mu_{ij}) = \lambda+\lambda_i^A+\lambda_j^B\)

where superscripts A and B are just used to denote the two categorical variables.

This is an ANOVA type-representation where:

- \(\lambda\) represents the "overall" effect, or the grand mean of the logarithms of the expected counts, and it ensures that \(\sum_i \sum_j \mu_{ij} = n\), that is, the expected cell counts under the fitted model add up to the total sample size \(n\).
- \(\lambda^A_I\) represents the "main" effect of variable A, or a deviation from the grand mean, and it ensures that \(\sum_j \mu_{ij} = n_{i+}\), that is, the marginal totals under the fitted model add up to the observed marginal counts. It represents the effect of classification in row \(i\).
- \(\lambda^B_J\) represents the "main" effect of variable B, or a deviation from the grand mean, and it ensures that \(\sum_i \mu_{ij} = n+j\). This is the effect of classification in column j.
- \(\lambda^A_I=\lambda^B_J=0\), or alternatively, \(\sum_i \lambda^{A}_{i} = \sum_j \lambda^{B}_{j} = 0\), to deal with over-parametrization (see below).

The maximum likelihood (ML) fitted values for the cell counts are the same as the expected (fitted) values under the test of independence in two-way tables, i.e., \(E(\mu_{ij}) = n_{i+}n_{+j}/n\). Thus, the \(X^2\) and \(G^2\) for the test of independence are goodness-of-fit statistics for the log-linear model of independence testing that the independence model holds versus that it does not, or more specifically testing that the independence model is true vs. saturated model is true. This model also implies that ALL odds ratios should be equal to 1.

##
Parameter Constraints & Uniqueness
Section* *

For an \(I\times J\) table, and the model is

\(\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B\)

There are \(I\) terms in the set {\(\lambda^A_I\)}, but one of them is redundant, so there are \(I − 1\) unknown parameters, e.g., {\(\lambda_{1}^{A}, \ldots , \lambda_{I-1}^{A}\)}, and there are \(J\) terms in the set {\(\lambda^B_J\)}, and one is redundant, so there are \(J − 1\) unknown parameters, e.g., {\(\lambda_1^B , \ldots, \lambda_{J-1}^B\)}. (Why is one of them redundant?) There can be many different parameterizations, but regardless of which set we use, we need to set the constraints to account for redundant parameters. Nonexistence of a unique set of parameters does not mean that the expected cell counts will change depending on which set of parameters is being used. It simply means that the estimates of the effects may be obtained under different sets of constraints, which will lead to different interpretations. But expected cell counts will remain the same.

DUMMY CODING: To avoid over-parametrization, one member in the set of \(\lambda\)s is fixed to have a constant value, typically 0. This corresponds to using dummy coding for the categorical variables (e.g. A = 1, 0). By default, in SAS PROC GENMOD, the last level is set to 0. So, we have

\(\log(\mu_{11})=\lambda+\lambda_1^A+\lambda_1^B\)

\(\log(\mu_{22})=\lambda+0+0=\lambda\)

By default, in R glm() the first level of the categorical variable is set to 0. So, we have

\(\log(\mu_{11})=\lambda+0+0=\lambda\)

\(\log(\mu_{22})=\lambda+\lambda_2^A+\lambda_2^B\)

ANOVA-type CODING: Another way to avoid over-parametrization is to fix the sum of the terms equal to a constant, typically 0. That is the ANOVA-type constraint. This corresponds to using the so-called "effect” coding for categorical variables (e.g. A = 1, 0, −1). By default, SAS PROC CATMOD and R loglin(), use the zero-sum constraint, e.g., the expected cell count in the first cell and the last cell,

\(\log(\mu_{11})=\lambda+\lambda_1^A+\lambda_1^B\)

\(\log(\mu_{22})=\lambda-\lambda_1^A-\lambda_1^B\)

We will see more on these with a specific example in the next section.

##
Link to odds and odds ratio
Section* *

We can have different parameter estimates (i.e, different values of \(\lambda\)s) depending on the type of constraints we set. So, what is unique about these parameters that lead to the same inference, regardless of parametrization? **The differences, that is the log odds, are unique**:

\(\lambda_i^A-\lambda_{i'}^A\)

\(\lambda_j^B-\lambda_{j'}^B\)

where the subscript \(i\) denotes one level of categorical variable \(A\) and "\(i\)" denotes another level of the same variable; similarly for \(B\).

Thus the odds is also unique!

\begin{align} \log(odds) &= \log\left(\dfrac{\mu_{i1}}{\mu_{i2}}\right)=\log(\mu_{i1})-\log(\mu_{i2})\\ &= (\lambda+\lambda_i^A+\lambda_1^B)-(\lambda+\lambda_i^A+\lambda_2^B)=\lambda_1^B-\lambda_2^B\\ \end{align}

\(\Rightarrow \mbox{odds} = \exp(\lambda_{1}^{B} − \lambda_{2}^{B})\)

If we have the model of independence, we expect that the log(odds ratio) is 0, that is, that the odds ratio is 1. Can you finish the calculation below and show that you get zero?

\begin{align}\log(oddsratio) &= \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\&= \log(\mu_{11})+\log(\mu_{22})-\log(\mu_{12})-\log(\mu_{21})\\&= \cdots\\\end{align}

\(\begin{align*}\log(oddsratio)& = \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\ &= \log(\mu_{11})+\log(\mu_{22})-\log(\mu_{12})+\log(\mu_{21})\\ &= \lambda+\lambda_1^A+\lambda_1^B+\lambda+\lambda_2^A+\lambda_2^B-\lambda-\lambda_1^A-\lambda_2^B-\lambda-\lambda_2^A-\lambda_1^B\\ &= 0\\ \end{align*}\)

The odds ratio measures the strength of the association and depends only on the interaction terms {\(\lambda_{ij}^{AB}\) }, which clearly does not appear in this model, but we will discuss it when we see the saturated log-linear model.

Do you recall, how many odds ratios do we need to completely characterize associations in \(I\times J\) tables?