10  Log-Linear Models

Log-Linear Models
Categorical Data
Model Hierarchy
Interaction Effects
Maximum Likelihood Estimation

Overview

Thus far in the course, we have alluded to log-linear models several times but have never got down to the specifics of them. When we dealt with inter-relationships among several categorical variables, our focus was mostly on describing their associations via

  • single summary statistics and
  • significance testing.

Log-linear models go beyond single summary statistics and specify how the cell counts depend on the levels of categorical variables. They model the association and interaction patterns among categorical variables. The log-linear model is natural for Poisson, Multinomial and Product-Multinomial sampling. They are appropriate when there is no clear distinction between response and explanatory variables or when there are more than two responses. This is a fundamental difference between logistic models and log-linear models. In the former, a response is identified, but no such special status is assigned to any variable in log-linear modeling. By default, log-linear models assume discrete variables to be nominal, but these models can be adjusted to deal with ordinal and matched data. Log-linear models are more general than logit models, but some log-linear models have direct correspondence to logit models.

Consider the Berkeley admission example. We may consider all possible relationships among A = Admission, D = Department and S = Sex. Alternatively, we may consider A as response and D and S as covariates, in which case the possible logit models are

  • logit model for A with only an intercept;
  • logit model for A with a main effect for D;
  • logit model for A with a main effect for S;
  • logit model for A with a main effects for D and S; and
  • logit model for A with main effects for D and S and the D \(\times\) S interaction.

Corresponding to each of the above, a log-linear model may be defined. The notations below follow those of Lesson 5.

  • Model of joint independence (DS, A), which indicates neither D nor S has an effect on A is equivalent to a logit model for response A with only an intercept;
  • Model of conditional independence (DS, DA), which indicates that S has no effect on A after the effect of D is included, is equivalent to a logit model for response A with the single predictor D; another conditional independence model (DS, SA) is equivalent to a logit model for response A with the single predictor S;
  • Model of homogeneous association (DS, DA, SA) indicates that the effect of S on A is the same at each level of department, which is equivalent to a logit model for response A with predictors D and S but no interaction; and
  • Saturated or unrestricted model (DSA) indicates that the effect of S on A varies across D and is equivalent to a logit model for response A with predictors D and S as well as the D by S interaction.

“Equivalent” means that two models give equivalent goodness-of-fit statistics relative to a saturated model, and equivalent expected counts for each cell. Log-linear models are not the same as logit models, because the log-linear models describe the joint distribution of all three variables, whereas the logit models describe only the conditional distribution of A given D and S. Log-linear models have more parameters than the logit models, but the parameters corresponding to the joint distribution of D and S are not of interest.

In general, to construct a log-linear model that is equivalent to a logit model, we need to include all possible associations among the predictors. In the Berkeley example, we need to include DS in every model. This lesson will walk through examples of how this is done in both SAS and R.

In subsequent sections, we look at the log-linear models in more detail. The two great advantages of log-linear models are that they are flexible and they are interpretable. Log-linear models have all the flexibility associated with ANOVA and regression. We have mentioned before that log-linear models are also another form of GLM. They also have natural interpretations in terms of odds and frequently have interpretations in terms of independence, as we have shown above.

Objectives

Upon completion of this lesson, you should be able to:

  1. Interpret the parameters of the log-linear model and how they can be used to explain joint and conditional associations among variables,
  2. Fit and use the log-linear model to explain joint and conditional associations among variables, and
  3. Interpret interactions of multiple variables in the context of the log-linear model.

Lesson 10 Code Files

10.1 Log-Linear Models for Two-way Tables

Overview

Recall, that a two-way ANOVA models the expected value of a continuous variable (e.g., plant length) depending on the levels of two categorical variables (e.g., low/high sunlight and low/high water amount). In contrast, the log-linear model expresses the cell counts (e.g., the number of plants in a cell) depending on the levels of two categorical variables.

Let \(\mu_{ij}\) be the expected counts, \(E(n_{ij})\), in an \(I \times J\) table, created by two random variables \(A\) and \(B\).

Objective

Model the cell counts: \(\mu_{ij} = n\pi_{ij}\)

Model Structure

An analogous saturated log-linear model to two-way ANOVA with interaction is

\[ \log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_{ij}^{AB} \]

where \(i = 1,\ldots, I, j = 1, \ldots, J\), are levels of categorical random variables \(A\) and \(B\), with constraints: \(\sum_i \lambda_i = \sum_j \lambda_j = \sum_i \sum_j \lambda_{ij} = 0\), to deal with overparametrization. Overparametrization means that the number of parameters is more than what can be uniquely estimated. This model is over-parametrized because term \(\lambda_{ij}\) already has \(I \times J\) parameters corresponding to the cell means \(\mu_{ij}\). The constant, \(\lambda\), and the “main effects”, \(\lambda_i\) and \(\lambda_j\) give us additional \(1 + I + J\) parameters. Superscripts denote variables \(A\) and \(B\). We will see more on this in the next sections.

Model Assumptions

The \(N = I \times J\) counts in the cells are assumed to be independent observations from a Poisson random variable, \(n_{ij}\sim \text{Poisson}(\mu_{ij})\). The log-linear modeling is natural for Poisson, Multinomial and Product-Multinomial sampling like we have discussed in earlier lessons.

Recall the Vitamin C study, a \(2 \times 2\) example from Lesson 3. Are the type of treatment and contracting cold independent? If there are associated, in which way are they associated?

We already know how to answer the above questions via the chi-square test of independence, but now we want to model the cell counts with the log-linear model of independence and ask if this model fits well.

Log-linear Models for Two-Way Tables

Given two categorical random variables, \(A\) and \(B\), there are two main types of models we will consider:

  • Independence model (A, B)
  • Saturated model (AB)

Let us start with the model of independence.

10.1.1 Model of Independence

Recall, that the independence can be stated in terms of cell probabilities as a product of marginal probabilities,

\[ \pi_{ij}=\pi_{i+}\pi_{+j}\quad i = 1, \ldots, I, j = 1, \ldots, J \]

and in terms of cell frequencies,

\[ \mu_{ij}=n\pi_{ij}=n\pi_{i+}\pi_{+j}\quad i = 1, \ldots, I, j = 1, \ldots, J \]

By taking natural logarithms on both sides of “=” sign, we obtain the loglinear model of independence:

\[ \log(\mu_{ij}) = \lambda+\lambda_i^A+\lambda_j^B \]

where superscripts A and B are just used to denote the two categorical variables.

This is an ANOVA type-representation where:

  • \(\lambda\) represents the “overall” effect, or the grand mean of the logarithms of the expected counts, and it ensures that \(\sum_i \sum_j \mu_{ij} = n\), that is, the expected cell counts under the fitted model add up to the total sample size \(n\).
  • \(\lambda^A_I\) represents the “main” effect of variable A, or a deviation from the grand mean, and it ensures that \(\sum_j \mu_{ij} = n_{i+}\), that is, the marginal totals under the fitted model add up to the observed marginal counts. It represents the effect of classification in row \(i\).
  • \(\lambda^B_J\) represents the “main” effect of variable B, or a deviation from the grand mean, and it ensures that \(\sum_i \mu_{ij} = n+j\). This is the effect of classification in column j.
  • \(\lambda^A_I=\lambda^B_J=0\), or alternatively, \(\sum_i \lambda^{A}_{i} = \sum_j \lambda^{B}_{j} = 0\), to deal with over-parametrization (see below).

The maximum likelihood (ML) fitted values for the cell counts are the same as the expected (fitted) values under the test of independence in two-way tables, i.e., \(E(\mu_{ij}) = n_{i+}n_{+j}/n\). Thus, the \(X^2\) and \(G^2\) for the test of independence are goodness-of-fit statistics for the log-linear model of independence testing that the independence model holds versus that it does not, or more specifically testing that the independence model is true vs. saturated model is true. This model also implies that ALL odds ratios should be equal to 1.

Parameter Constraints & Uniqueness

For an \(I\times J\) table, and the model is

\[ \log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B \]

There are \(I\) terms in the set {\(\lambda^A_I\)}, but one of them is redundant, so there are \(I − 1\) unknown parameters, e.g., {\(\lambda_{1}^{A}, \ldots , \lambda_{I-1}^{A}\)}, and there are \(J\) terms in the set {\(\lambda^B_J\)}, and one is redundant, so there are \(J − 1\) unknown parameters, e.g., {\(\lambda_1^B , \ldots, \lambda_{J-1}^B\)}. (Why is one of them redundant?) There can be many different parameterizations, but regardless of which set we use, we need to set the constraints to account for redundant parameters. Nonexistence of a unique set of parameters does not mean that the expected cell counts will change depending on which set of parameters is being used. It simply means that the estimates of the effects may be obtained under different sets of constraints, which will lead to different interpretations. But expected cell counts will remain the same.

DUMMY CODING: To avoid over-parametrization, one member in the set of \(\lambda\)s is fixed to have a constant value, typically 0. This corresponds to using dummy coding for the categorical variables (e.g. A = 1, 0). By default, in SAS PROC GENMOD, the last level is set to 0. So, we have

\[ \log(\mu_{11})=\lambda+\lambda_1^A+\lambda_1^B \]

\[ \log(\mu_{22})=\lambda+0+0=\lambda \]

By default, in R glm() the first level of the categorical variable is set to 0. So, we have

\[ \log(\mu_{11})=\lambda+0+0=\lambda \]

\[ \log(\mu_{22})=\lambda+\lambda_2^A+\lambda_2^B \]

ANOVA-type CODING: Another way to avoid over-parametrization is to fix the sum of the terms equal to a constant, typically 0. That is the ANOVA-type constraint. This corresponds to using the so-called “effect” coding for categorical variables (e.g. A = 1, 0, −1). By default, SAS PROC CATMOD and R loglin(), use the zero-sum constraint, e.g., the expected cell count in the first cell and the last cell,

\[ \log(\mu_{11})=\lambda+\lambda_1^A+\lambda_1^B \]

\[ \log(\mu_{22})=\lambda-\lambda_1^A-\lambda_1^B \]

We will see more on these with a specific example in the next section.

10.1.2 Saturated Model

With the saturated model, the \(N = IJ\) counts in the cells are still assumed to be independent observations of a Poisson random variable, but no independence is assumed between the variables \(A\) and \(B\). The model is expressed as

\[ \log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_{ij}^{AB} \]

Note the additional \(\lambda_{ij}^{AB}\) interaction term, compared with the independence model. We still have the same ANOVA-like constraints to avoid over-parameterization, and again, this changes with the software used. Ultimately, the inference, in the end, will not depend on the choice of constraints.

Parameter estimates and interpretation

The odds ratio is directly related to the interaction terms. For example, for a \(2 \times 2\) table,

\[ \begin{align}\log(\theta) &= \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\ &= \lambda_{11}^{AB}+\lambda_{22}^{AB}-\lambda_{12}^{AB}-\lambda_{21}^{AB}\end{align} \]

Stop and Think!

Can you fill in the missing line for the equation above?

\[\begin{align*}\log(\theta)& = \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\ &= \lambda+\lambda_1^A+\lambda_1^B+\lambda_{11}^{AB}+\lambda+\lambda_2^A+\lambda_2^B+\lambda_{22}^{AB}-\lambda-\lambda_1^A-\lambda_2^B-\lambda_{12}^{AB}-\lambda-\lambda_2^A-\lambda_1^B-\lambda_{21}^{AB}\\ &= \lambda_{11}^{AB}+\lambda_{22}^{AB}-\lambda_{12}^{AB}-\lambda_{21}^{AB}\\ \end{align*}\]

How many odds ratios are there? There should be \((I-1) (J-1)\) which should be equal to the unique number of \(\lambda_{ij}\)s in the model.

How many unique parameters are there in this model?

Term # of terms #of constraints # of unique parameters
\(\lambda\) \(1\) \(0\) \(1\)
\(\lambda_i^A\) \(I\) \(1\) \(I - 1\)
\(\lambda_i^B\) \(J\) \(1\) \(J - 1\)
\(\lambda_{ij}^{AB}\) \(I J\) \(I + J - 1\) \((I - 1) (J - 1)\)

With \(I J = N\) terms, the model is a perfect fit! Moreover, the saturated model includes the independence model as a special case, where the interaction terms are 0.

Example 10.2 (Vitamin C) Let’s evaluate the CATMOD and GENMOD output for VitaminCLoglin.sas and R code for VitaminCLoglin.R.

To fit the saturated models, we need to specify or add an interaction term in our code.

In CATMOD:

proc catmod data=ski order=data;
weight count;
model treatment*response=_response_;
loglin treatment|response;
run;

The LOGLIN statement (line) now specifies the saturated model. treatment|response specification adds an interaction term to the main effects model.

In GENMOD:

proc genmod data=ski order=data;
class treatment response;
model count = response*treatment /link=log dist=poisson lrci type3 obstats;
run;

The * sign in the response*treatment, indicates that we want the interaction term and ALL lower order terms; that is, the saturated model.

Notice, that we can also specify the saturated model by explicitly entering the main effects in the model statement:

model count = response treatment response\*treatment/link=log dist=poisson lrci type3 obstats;

In these two model specifications under GENMOD, we are fitting different numbers of parameters, and the output sections on LR Statistics For Type 3 Analysis will be different, but the estimates of odds and odds-ratios will be the same. For the output, we have the same parts as we did for the independence model, except now we have more parameter estimates. We will consider the latter GENMOD parametrization and compare it to that for CATMOD.

So, what are the odds of getting cold given that a person took a placebo pill? Let’s first start with log-odds:

\[ \log(\mu_{11}/\mu_{12})=\log(\mu_{11})-\log(\mu_{12})=[\lambda+\lambda_1^A+\lambda_1^B+\lambda_{11}^{AB}]-[\lambda+\lambda_1^A+\lambda_2^B+\lambda_{12}^{AB}] \]

  • Based on GENMOD parameterization: \(\lambda_1^B + \lambda_{11}^{AB} - \lambda_2^B -\lambda_{12}^{AB} = -1.9708 + 0.7134 - 0 - 0\), and the odds are \(\exp(-1.2574) = 0.2844\).

  • Based on CATMOD parameterization: \([\lambda+\lambda_1^A + \lambda_1^B+\lambda_{11}^{AB} ]- [\lambda+\lambda_1^A- \lambda_1^B - \lambda_{11}^{AB}] = 2\lambda_1^B+2\lambda_{11}^{AB} \= 2(-0.807 + 0.1784)\), and the odds are \(\exp(\log \mbox{odds}) = 0.2844\).

In R, the syntax for loglin() would be

loglin(ski, list(c(1, 2)), fit=TRUE, param=TRUE)

And the syntax for glm() would be

glm(Freq~Treatment*Cold, family=poisson(), data=ski.data)

Q1: What are the odds ratio based on CATMOD or loglin()? What about GENMOD or glm()?

For example, from the second way of fitting the model with GENMOD, the odds ratio is \(\exp(0.7134)=2.041\), with 95% CI \((\exp(0.079), \exp(1.3770)\). This corresponds to \(1/2.041=0.49\), which we saw before.

The SAS System

Obs Parameter Level1 Level2 DF Estimate StdErr LowerLRCL UpperLRCL ChiSq ProbChiSq
1 treatment*response placebo cold 1 0.7134 0.3293 0.0790 1.3770 4.69 0.0303

From R, we get the same:

summary(ski.data)
                             Estimate Std. Error  z value   Pr(>|z|)
TreatmentVitaminC:ColdNoCold 0.713447  0.3293215 2.166415 0.03027947

Q2: Is the interaction of treatment and response significant (hint: look at LR Statistics for Type 3 Analysis). Where does df = 3 come from?

Q3: Model fits perfectly. Let’s do some model selection. Compare the independence and saturated model, by calculating the LR statistics (e.g. difference in deviances).

\[ 2\times (973.0657-970.629)\approx 4.8717 \]

Q4: How many df does this statistic have?

Hierarchical Models

These models include all lower-order terms that comprise higher-order terms in the model.

(A, B) is a simpler model than (AB)

Interpretation does not depend on how the variables are coded.

Stop and Think!

Is this a hierarchical model? Why or why not?

\[ \log(\mu_{ij})=\lambda+\lambda_i^A+ \lambda_{ij}^{AB} \]

No, it is not a hierarchical model. It is missing \(\lambda^{B}_{j}\).

We want hierarchical models so that interaction terms represent associations. If there is a significant interaction term, we do NOT look at the lower-order terms, but only interpret the higher-order terms because the value of lower-order terms is coding dependent and can be misleading.

Next, we’ll see how these models extend to three-way tables.

10.2 Log-linear Models for Three-way Tables

In this section we will extend the concepts we learned about log-linear models for two-way tables to three-way tables. We will learn how to fit various models of independence discussed in Lesson 5 (e.g., conditional independence, joint independence, etc.) and will see additional statistics, besides the usual \(X^2\) and \(G^2\), to assess the model fit and to choose the “best” model.

The notation for log-linear models extends to three-way tables as follows. If \(\mu_{ijk}\) represents the mean for the \(i\)th, \(j\)th, and \(k\)th levels of variables \(A\), \(B\), and \(C\), respectively, then we write

\[ \log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C+\lambda_{ij}^{AB}+\lambda_{ik}^{AC}+\lambda_{jk}^{BC}+\lambda_{ijk}^{ABC} \]

The main questions of interest are:

  • What do the \(\lambda\) terms mean in this model?
  • What hypotheses about them correspond to the models of independence we already know?
  • What are some efficient ways to specify and interpret these models and tables?
  • What are some efficient ways to fit and select among many possible models in three and higher dimensions?

As before for three-way tables, there are multiple models we can test and now fit. The log-linear models we will fit and evaluate are

  1. Saturated
  2. Complete Independence
  3. Joint Independence
  4. Conditional Independence
  5. Homogeneous Association

Example 10.3 (Graduate Admissions) Let us go back to our familiar dataset on graduate admissions at Berkeley:

Dept. Males admitted Males rejected Females admitted Females rejected
A 512 313 89 19
B 353 207 17 8
C 120 205 202 391
D 139 279 131 244
E 53 138 94 299
F 22 351 24 317

Let D = department, S = sex, and A = admission status (rejected or accepted). We analyzed this as a three-way table before, and specifically, we looked at partial and marginal tables. Now we will look at it from a log-linear model point of view. Let \(Y\) be the observed frequency or count in a particular cell of the three-way table.

10.2.1 Saturated Log-Linear Model

This model is the default model in a way that serves for testing of goodness-of-fit of the other models. Recall that the saturated model has the maximum number of parameters and fitting a saturated model is the same as estimating ML parameters of distributions appropriate for each cell of the contingency table.

Main assumption

The \(N = IJK\) counts in the cells are assumed to be independent observations of a Poisson random variable.

Model structure

\[ \log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C+\lambda_{ij}^{AB}+\lambda_{ik}^{AC}+\lambda_{jk}^{BC}+\lambda_{ijk}^{ABC} \]

Parameter constraints can be different, but the (typical) set-to-zero constraints imply that any \(\lambda\) for which a subscript corresponds to the last category (i.e., includes \(I\), \(J\), or \(K\)) is set to 0. For example,

\[ \lambda_I^A=\lambda_J^B=\lambda_K^C=\lambda_{Ij}^{AB}=\cdots=\lambda_{ijK}^{ABC}=0 \]

Parameter estimation and interpretation

  • \(\lambda\) represents an “overall” effect or a grand mean (on the log scale) of the expected counts, and it ensures that \(\sum_i\sum_j\sum_k\mu_{ijk}=n\).
  • \(\lambda_i^A\), \(\lambda_j^B\), and \(\lambda_k^C\) represent the “main” effects of variables \(A\), \(B\), and \(C\), or deviations from the grand mean.
  • \(\lambda_{ij}^{AB}\), \(\lambda_{ik}^{AC}\), \(\lambda_{jk}^{BC}\) represent the interaction/association between two variables while controlling the third (e.g., conditional odds ratios, test for partial associations) and reflect the departure from independence
  • \(\lambda_{ijk}^{ABC}\) represents the interaction/association between three variables and reflect the departure from independence

If there is a significant interaction term, we typically do not look at the lower-order terms but only interpret the higher-order terms because the values of lower-order terms are coding dependent and can be misleading.

Model Fit

The saturated model has a perfect fit with \(G^2=0\) and df = 0 since the number of cells is equal to the number of unique parameters in the model.

Model Selection

Relevant when comparing to simpler models. The saturated model is the most complex model possible.

Fitting in SAS and R

Using PROC GENMOD, let us fit the saturated log-linear model.

proc genmod data=berkeley order=data;
class D S A;
model count= D S A D*S D*A S*A D*S*A/dist=poisson link=log;
run;

When we use the order=data option, GENMOD orders the levels of class variables in the same order as they appear in the dataset. For each class variable, GENMOD creates a set of dummy using the last category as a reference group (recall the CATMOD and GENMOD coding from the previous lesson). Therefore, we can interpret a two-way association as the log-odds ratio for the two variables in question, with the other variable held constant at its last category (i.e., conditional odds-ratios).

Here’s a portion of the SAS output that includes ML estimates.

The SAS System

The GENMOD Procedure

Analysis Of Maximum Likelihood Parameter Estimates
Parameter   DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept       1 3.1781 0.2041 2.7780 3.5781 242.40 <.0001
D DeptA     1 1.3106 0.2300 0.8598 1.7614 32.47 <.0001
D DeptB     1 -0.3448 0.3170 -0.9662 0.2765 1.18 0.2767
D DeptC     1 2.1302 0.2159 1.7070 2.5534 97.34 <.0001
D DeptD     1 1.6971 0.2220 1.2620 2.1323 58.42 <.0001
D DeptE     1 1.3652 0.2287 0.9170 1.8135 35.63 <.0001
D DeptF     0 0.0000 0.0000 0.0000 0.0000 . .
S Male     1 -0.0870 0.2952 -0.6655 0.4915 0.09 0.7682
S Female     0 0.0000 0.0000 0.0000 0.0000 . .
A Reject     1 2.5808 0.2117 2.1659 2.9958 148.61 <.0001
A Accept     0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptA Male   1 1.8367 0.3167 1.2159 2.4575 33.63 <.0001
D*S DeptA Female   0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptB Male   1 3.1203 0.3857 2.3643 3.8763 65.44 <.0001
D*S DeptB Female   0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptC Male   1 -0.4338 0.3169 -1.0548 0.1873 1.87 0.1710
D*S DeptC Female   0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptD Male   1 0.1391 0.3194 -0.4869 0.7650 0.19 0.6632
D*S DeptD Female   0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptE Male   1 -0.4860 0.3415 -1.1553 0.1834 2.03 0.1547
D*S DeptE Female   0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptF Male   0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptF Female   0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptA Reject   1 -4.1250 0.3297 -4.7712 -3.4789 156.56 <.0001
D*A DeptA Accept   0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptB Reject   1 -3.3346 0.4782 -4.2718 -2.3974 48.63 <.0001
D*A DeptB Accept   0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptC Reject   1 -1.9204 0.2288 -2.3688 -1.4721 70.48 <.0001
D*A DeptC Accept   0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptD Reject   1 -1.9589 0.2378 -2.4250 -1.4928 67.85 <.0001
D*A DeptD Accept   0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptE Reject   1 -1.4237 0.2425 -1.8990 -0.9484 34.47 <.0001
D*A DeptE Accept   0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptF Reject   0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptF Accept   0 0.0000 0.0000 0.0000 0.0000 . .
S*A Male Reject   1 0.1889 0.3052 -0.4092 0.7870 0.38 0.5359
S*A Male Accept   0 0.0000 0.0000 0.0000 0.0000 . .
S*A Female Reject   0 0.0000 0.0000 0.0000 0.0000 . .
S*A Female Accept   0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptA Male Reject 1 0.8632 0.4027 0.0740 1.6524 4.60 0.0321
D*S*A DeptA Male Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptA Female Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptA Female Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptB Male Reject 1 0.0311 0.5335 -1.0145 1.0767 0.00 0.9535
D*S*A DeptB Male Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptB Female Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptB Female Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptC Male Reject 1 -0.3138 0.3374 -0.9751 0.3475 0.87 0.3523
D*S*A DeptC Male Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptC Female Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptC Female Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptD Male Reject 1 -0.1069 0.3401 -0.7735 0.5597 0.10 0.7533
D*S*A DeptD Male Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptD Female Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptD Female Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptE Male Reject 1 -0.3891 0.3650 -1.1045 0.3263 1.14 0.2864
D*S*A DeptE Male Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptE Female Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptE Female Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptF Male Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptF Male Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptF Female Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*S*A DeptF Female Accept 0 0.0000 0.0000 0.0000 0.0000 . .
Scale       0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

By default, R will set the first category (first alphabetically) as the reference level with the set-to-zero constraint, which we can change manually if we wish. The results will be equivalent, but the interpretations of the estimates reported will be different. With this choice of reference, we can interpret a two-way association as the log-odds ratio for the two variables in question, with the other variable held constant at its last category (i.e., conditional odds-ratios). Also, note that the built-in data set UCBAdmissions uses “Gender” instead of “Sex”.

UCBAdmissions 
berk.data = as.data.frame(UCBAdmissions)
berk.data$Gender = relevel(berk.data$Gender, ref='Female')
berk.data$Dept = relevel(berk.data$Dept, ref='F')
berk.sat = glm(Freq~Admit*Gender*Dept, family=poisson(), data=berk.data)

Here’s a portion of the summary output that includes ML estimates.

summary(berk.sat)

Call:
glm(formula = Freq ~ Admit * Gender * Dept, family = poisson(), 
    data = berk.data)

Coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     3.17805    0.20412  15.569  < 2e-16 ***
AdmitRejected                   2.58085    0.21171  12.190  < 2e-16 ***
GenderMale                     -0.08701    0.29516  -0.295   0.7682    
DeptA                           1.31058    0.23001   5.698 1.21e-08 ***
DeptB                          -0.34484    0.31700  -1.088   0.2767    
DeptC                           2.13021    0.21591   9.866  < 2e-16 ***
DeptD                           1.69714    0.22204   7.644 2.11e-14 ***
DeptE                           1.36524    0.22870   5.969 2.38e-09 ***
AdmitRejected:GenderMale        0.18890    0.30516   0.619   0.5359    
AdmitRejected:DeptA            -4.12505    0.32968 -12.512  < 2e-16 ***
AdmitRejected:DeptB            -3.33462    0.47817  -6.974 3.09e-12 ***
AdmitRejected:DeptC            -1.92041    0.22876  -8.395  < 2e-16 ***
AdmitRejected:DeptD            -1.95888    0.23781  -8.237  < 2e-16 ***
AdmitRejected:DeptE            -1.42370    0.24250  -5.871 4.33e-09 ***
GenderMale:DeptA                1.83670    0.31672   5.799 6.66e-09 ***
GenderMale:DeptB                3.12027    0.38572   8.090 5.99e-16 ***
GenderMale:DeptC               -0.43376    0.31687  -1.369   0.1710    
GenderMale:DeptD                0.13907    0.31938   0.435   0.6632    
GenderMale:DeptE               -0.48599    0.34151  -1.423   0.1547    
AdmitRejected:GenderMale:DeptA  0.86318    0.40267   2.144   0.0321 *  
AdmitRejected:GenderMale:DeptB  0.03113    0.53349   0.058   0.9535    
AdmitRejected:GenderMale:DeptC -0.31382    0.33741  -0.930   0.3523    
AdmitRejected:GenderMale:DeptD -0.10691    0.34013  -0.314   0.7533    
AdmitRejected:GenderMale:DeptE -0.38908    0.36500  -1.066   0.2864    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2.6501e+03  on 23  degrees of freedom
Residual deviance: 3.0931e-13  on  0  degrees of freedom
AIC: 207.06

Number of Fisher Scoring iterations: 3

The intercept is a normalizing constant and should be ignored. The main effects for D, S, and A are all difficult to interpret and not very meaningful since we have significant two-way and three-way associations; the two- and three-way associations are highly meaningful. For example, the estimated coefficient for the SA association is 0.1889.

Exponentiating this coefficient gives

\[ \exp(0.1889)=1.208, \]

which is the estimated SA odds ratio for Department F since that is the reference department in this analysis. The reference group for S is “female,” and the reference group for A is “accept.” If we write the \(2 \times 2\) table for \(S \times A\) in Department F, i.e., the “partial” table, with the reference groups in the last row and column, we get

Dept F Reject Accept
Men 315 22
Women 317 24

for which the estimated odds ratio is:

\[ 351 \times 24/317 \times 22=1.208. \]

The Wald z-statistic for this coefficient,

\[ z=0.1889/0.3052, \]

which corresponds to Chi-square statistic \(0.62^2 = 0.38\) with p-value 0.5359 and indicates that the SA odds ratio for Department F is not significantly different from 1.00 or that the log odds ratio is not significantly different from 0.

The 95% confidence interval for the parameter estimate, that is, for the log odds ratio, is \((−0.4092, 0.7870)\). Thus the 95% confidence interval for the odds ratio is

\[ (\exp(-0.4092),\exp(0.7870))=(0.67,2.20). \]

To get the SA odds ratio for any other department, we have to combine the SA coefficient with one of the DSA coefficients. For example, the SA odds ratio for Department A is

\[ \exp(0.1889+0.8632)=2.864. \]

Based on:

\[ \begin{align} \theta_{SA(i="A")} &= \dfrac{\mu_{ijk} \mu_{ij'k'}}{\mu_{ij'k} \mu_{ijk'}}\\ &= (\lambda_{jk}+\lambda_{j'k'}-\lambda_{j'k}-\lambda_{jk'})+(\lambda_{ijk}+\lambda_{ij'k'}-\lambda_{ij'k}-\lambda_{ijk'})\\ &= (0.1889+0-0-0)+(0.8632+0-0-0) \end{align} \]

The Wald z-statistic for the first DSA coefficient,

\[ z=0.8632/0.4027, \]

indicates that the SA odds ratio for Department A is significantly different from the SA odds ratio in Department F. To see if the SA odds ratio in Department A is significantly different from 1.00, we would have to compute the standard error for the sum of the two coefficients using the estimated covariance matrix, or refit the model by fixing the level of interest equal to 0.

In many situations, however, we take recourse to the saturated model only as a last resort. As the number of variables grow, saturated models become more and more difficult to interpret. In the following sections, we look at simpler models which are useful in explaining the associations among the discrete variables of interest.

10.2.2 Complete Independence

This is the most restrictive model in that all variables are assumed to be jointly independent, regardless of any conditioning. Equivalently, this requires that each two and three-way distribution factors into the product of the marginal distributions involved.

Main assumptions

  • The \(N = IJK\) counts in the cells are assumed to be independent observations of a Poisson random variable, and
  • there are no partial interactions: \(\lambda_{ij}^{AB} =\lambda_{ik}^{AC} =\lambda_{jk}^{BC}=0\), for all \(i, j, k\), and
  • there is no three-way interaction: \(\lambda_{ijk}^{ABC}=0\) for all \(i, j, k\).

Note the constraints above are in addition to the usual set-to-zero or sum-to-zero constraints (present even in the saturated model) imposed to avoid overparameterization.

Model Structure

\[ \log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C \]

In SAS, the model of complete independence (D, S, A) can be fitted with the following commands:

proc genmod data=berkeley order=data;
class D S A;
model count = D S A / dist=poisson link=log;
run;

What are the estimated odds of male vs. female in this example? From the output, the ML estimate of the parameter S-Male, thus, the odds of being male are higher than being female applicant:

\[ \exp(0.382) = 1.467 = 2691/1835 \]

with p-value < .0001 indicating that the odds are significantly different. Note these are odds, not odds ratios! (When we are dealing with main effects we do not look at odds ratios.)

What about the odds of being rejected? What can we conclude from the part of the output below?

The SAS System

The GENMOD Procedure

Analysis Of Maximum Likelihood Parameter Estimates
Parameter   DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept   1 4.7207 0.0455 4.6315 4.8100 10748.0 <.0001
D DeptA 1 0.2675 0.0497 0.1701 0.3650 28.95 <.0001
D DeptB 1 -0.1993 0.0558 -0.3086 -0.0900 12.77 0.0004
D DeptC 1 0.2513 0.0499 0.1535 0.3491 25.37 <.0001
D DeptD 1 0.1037 0.0516 0.0025 0.2048 4.04 0.0445
D DeptE 1 -0.2010 0.0558 -0.3103 -0.0916 12.98 0.0003
D DeptF 0 0.0000 0.0000 0.0000 0.0000 . .
S Male 1 0.3829 0.0303 0.3235 0.4422 159.93 <.0001
S Female 0 0.0000 0.0000 0.0000 0.0000 . .
A Reject 1 0.4567 0.0305 0.3969 0.5165 224.15 <.0001
A Accept 0 0.0000 0.0000 0.0000 0.0000 . .
Scale   0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

But, we should really check the overall fit of the model first, to determine if these estimates are meaningful.

Model Fit

The goodness-of-fit statistics indicate that the model does not fit.

The SAS System

The GENMOD Procedure

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 16 2097.6712 131.1045
Scaled Deviance 16 2097.6712 131.1045
Pearson Chi-Square 16 2000.3281 125.0205
Scaled Pearson X2 16 2000.3281 125.0205
Log Likelihood   19464.3700  
Full Log Likelihood   -1128.3655  
AIC (smaller is better)   2272.7309  
AICC (smaller is better)   2282.3309  
BIC (smaller is better)   2282.1553  

If the model fits well, the “Value/DF” would be close to 1. Recall how we get the degrees of freedom:

  • df = number of cells - number of fitted parameters in the model.
  • df = number of fitted parameters in the saturated model - number of fitted parameters in our model.

Recall that these goodness-of-fit statistics compare the fitted model to the saturated model. Thus, the model of complete independence does not fit well in comparison to the saturated model.

In R, the model of complete independence can be fit with the following commands:

berk.ind = glm(Freq~Admit+Gender+Dept, family=poisson(), data=berk.data)

What are the estimated odds of male vs. female in this example? From the output, the ML estimate of the parameter GenderMale, thus, the odds of being male are higher than being female applicant:

\[ \exp(0.382) = 1.467 = 2691/1835 \]

with p-value < 2e-16, indicating that the odds are significantly different. Note these are odds, not odds ratios! (When we are dealing with main effects we do not look at odds ratios.)

summary(berk.ind)

Call:
glm(formula = Freq ~ Admit + Gender + Dept, family = poisson(), 
    data = berk.data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    4.72072    0.04553 103.673  < 2e-16 ***
AdmitRejected  0.45674    0.03051  14.972  < 2e-16 ***
GenderMale     0.38287    0.03027  12.647  < 2e-16 ***
DeptA          0.26752    0.04972   5.380 7.44e-08 ***
DeptB         -0.19927    0.05577  -3.573 0.000352 ***
DeptC          0.25131    0.04990   5.036 4.74e-07 ***
DeptD          0.10368    0.05161   2.009 0.044533 *  
DeptE         -0.20098    0.05579  -3.602 0.000315 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2650.1  on 23  degrees of freedom
Residual deviance: 2097.7  on 16  degrees of freedom
AIC: 2272.7

Number of Fisher Scoring iterations: 5

What about the odds of being rejected? What can we conclude from the part of the output above?

But, we should really check the overall fit of the model first, to determine if these estimates are meaningful.

Model Fit

The reported “Residual deviance” of 2097.7 on 16 degrees of freedom indicates that the model does not fit. If the model fits well, the “Value/DF” would be close to 1. Recall how we get the degrees of freedom:

  • df = number of cells - number of fitted parameters in the model.
  • df = number of fitted parameters in the saturated model - number of fitted parameters in our model.

Recall that this goodness-of-fit statistic compares the fitted model to the saturated model. Thus, the model of complete independence does not fit well in comparison to the saturated model.

10.2.3 Joint Independence

With three variables, there are three ways to satisfy joint independence. The assumption is that one variable is independent of the other two, but the latter two can have an arbitrary association. For example, if \(A\) is jointly independent of \(B\) and \(C,\) which we denote by \((A, BC),\) then \(A\) is independent of each of the others, and we can factor the three-way joint distribution into the product of the marginal distribution for \(A\) and the two-way joint distribution of \((B,C).\)

[Main assumptions] {.unnumbered .unlisted}

(these are stated for the case \((A,BC)\) but hold in a similar way for \((B, AC)\) and \((C, AB)\) as well)

  • The \(N = IJK\) counts in the cells are assumed to be independent observations of a Poisson random variable, and
  • there are no partial interactions involving \(A\): \(\lambda_{ij}^{AB} =\lambda_{ik}^{AC} =0\), for all \(i, j, k\), and
  • there is no three-way interaction: \(\lambda_{ijk}^{ABC}=0\) for all \(i, j, k\).

Note the constraints above are in addition to the usual set-to-zero or sum-to-zero constraints (present even in the saturated model) imposed to avoid overparameterization.

Model Structure

\[ \log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C +\lambda_{jk}^{BC} \]

In the example below, we consider the model where admission status is jointly independent of department and sex, which we denote by \((A, DS).\)

In SAS, the model can be fitted like this:

proc genmod data=berkeley order=data;
class D S A;
model count = D S A D*S / dist=poisson link=log lrci type3 obstats;
run;

This model implies that the association between D and S does NOT depend on the level of the variable A. That is, the association between department and sex is independent of the rejection/acceptance decision.

\[ \theta_{DS(A=reject)}=\theta_{DS(A=accept)}=\dfrac{\text{exp}(\lambda_{ij}^{DS})\text{exp}(\lambda_{i'j'}^{DS})}{\text{exp}(\lambda_{i'j}^{DS})\text{exp}(\lambda_{ij'}^{DS})} \]

Since we are assuming that the (DS) distribution is independent of A, then we are assuming that the conditional distribution of DS, given A, is the same as the unconditional distribution of DS. And equivalently, the conditional distribution of A, given DS, is the same as the unconditional distribution of A. In other words, if this model fits well, neither department nor sex tells us anything significant about admission status.

The first estimated coefficient 1.9436 for the DS associations…

The SAS System

Obs Parameter Level1 Level2 DF Estimate StdErr LowerWaldCL UpperWaldCL ChiSq ProbChiSq
1 D*S DeptA Male 1 1.9436 0.1268 1.6950 2.1921 234.84 <.0001

implies that the estimated odds ratio between sex and department (specifically, A versus F) is \(\exp(1.9436) = 6.98\) with 95% CI

\[ (\exp(1.695), \exp(2.192))= (5.45, 8.96) \]

But, we should really check the overall fit of the model first, to determine if these estimates are meaningful.

Model Fit

The goodness-of-fit statistics indicate that the model does not fit since the “Value/df” is much larger than 1.

The SAS System

The GENMOD Procedure

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 11 877.0564 79.7324
Scaled Deviance 11 877.0564 79.7324
Pearson Chi-Square 11 797.7041 72.5186
Scaled Pearson X2 11 797.7041 72.5186
Log Likelihood   20074.6774  
Full Log Likelihood   -518.0581  
AIC (smaller is better)   1062.1161  
AICC (smaller is better)   1098.5161  
BIC (smaller is better)   1077.4308  

How did we get these degrees of freedom? As usual, we’re comparing this model to the saturated model, and the df is the difference in the numbers of parameters involved:

\[ DF = (IJK-1) - [(I-1)+(J-1)+(K-1)+(J-1)(K-1)] = (I-1)(JK-1) \]

With \(I=2\), \(J=2\), and \(K=6\) corresponding to the levels of A, S, and D, respectively, this works out to be 11.

As before, we can look at the residuals for more information about why this model fits poorly. Recall that adjusted residuals have approximately a N(0, 1) distribution (i.e., “Std Pearson Residual”). In general, we have a lack of fit if (1) we have a large number of cells and adjusted residuals are greater than 3, or (2) we have a small number of cells and adjusted residuals are greater than 2. Here is only part of the output. Notice that the absolute value of the standardized residual for the first six cells are all large (e.g., in cell 1, the value is \(-15.1793\)). Many other such residuals are rather large as well, indicating that this model fits poorly.

Evaluate the residuals

The SAS System

The GENMOD Procedure

Observation Statistics
Observation count D S A Predicted Value Linear Predictor Standard Error of the Linear Predictor HessWgt Lower Upper Raw Residual Pearson Residual Deviance Residual Std Deviance Residual Std Pearson Residual Likelihood Residual Leverage CookD DFBETA_Intercept DFBETA_DDeptA DFBETA_DDeptB DFBETA_DDeptC DFBETA_DDeptD DFBETA_DDeptE DFBETA_SMale DFBETA_AReject DFBETA_DDeptASMale DFBETA_DDeptBSMale DFBETA_DDeptCSMale DFBETA_DDeptDSMale DFBETA_DDeptESMale DFBETAS_Intercept DFBETAS_DDeptA DFBETAS_DDeptB DFBETAS_DDeptC DFBETAS_DDeptD DFBETAS_DDeptE DFBETAS_SMale DFBETAS_AReject DFBETAS_DDeptASMale DFBETAS_DDeptBSMale DFBETAS_DDeptCSMale DFBETAS_DDeptDSMale DFBETAS_DDeptESMale
1 313 DeptA Male Reject 505.09831 6.2247531 0.0367703 505.09831 469.97739 542.84378 -192.0983 -8.547431 -9.199152 -16.3367 -15.17931 -15.55562 0.6829213 38.173694 0.1338572 0 -2.63E-16 -2.63E-16 0 -2.63E-16 0 -0.218635 -0.734349 -5.25E-16 -2.63E-16 0 -5.25E-16 2.3367482 0 -1.27E-15 -3.87E-15 0 -3.55E-15 0 -7.166703 -5.790199 -2.41E-15 -2.58E-15 0 -4.54E-15
2 512 DeptA Male Accept 319.90169 5.7680137 0.0395092 319.90169 296.06444 345.65816 192.09831 10.740272 9.8692317 13.948262 15.179309 14.575998 0.4993589 17.678562 0.1338572 0 3.328E-16 1.664E-16 0 1.664E-16 0 -0.218635 0.4650965 3.328E-16 1.664E-16 -1.66E-16 3.328E-16 2.3367481 0 1.606E-15 2.448E-15 0 2.248E-15 0 -7.166703 3.6671961 1.529E-15 1.634E-15 -1.61E-15 2.876E-15
3 19 DeptA Female Reject 66.122038 4.1915021 0.0969494 66.122038 54.679277 79.959431 -47.12204 -5.794967 -6.845121 -11.12613 -9.419201 -10.09928 0.6214932 11.205917 0.0275065 -1.152726 -3.78E-16 -1.08E-16 0 -1.08E-16 -1.62E-16 -0.044928 1.1527259 4.859E-16 2.16E-16 0 1.62E-16 0.4801819 -10.4398 -1.82E-15 -1.59E-15 0 -1.46E-15 -2.16E-15 -1.472697 9.0890214 2.232E-15 2.12E-15 0 1.4E-15
4 89 DeptA Female Accept 41.878088 3.7347627 0.0980209 41.878088 34.558213 50.748407 47.121912 7.2816446 6.3202597 8.1755761 9.4191761 8.6973682 0.402369 4.5948769 0.0275065 0.7300717 2.736E-16 6.839E-17 0 6.839E-17 1.026E-16 -0.044928 -0.730072 -3.08E-16 -1.37E-16 -3.42E-17 -1.03E-16 0.4801807 6.6119817 1.32E-15 1.006E-15 0 9.241E-16 1.369E-15 -1.472693 -5.756474 -1.41E-15 -1.34E-15 -3.31E-16 -8.86E-16
5 207 DeptB Male Reject 342.85461 5.8373065 0.0438822 342.85461 314.59903 373.64795 -135.8546 -7.337015 -7.925271 -13.59608 -12.58691 -12.93864 0.6602177 23.679966 0.0883403 -1.73E-16 -2.77E-15 -6.94E-16 -5.2E-16 -3.47E-16 -6.94E-16 -0.14429 6.936E-16 -0.713979 1.04E-15 1.04E-15 1.04E-15 1.5421589 -1.57E-15 -1.34E-14 -1.02E-14 -6.95E-15 -4.69E-15 -9.26E-15 -4.729733 5.469E-15 -3.279442 1.021E-14 1.007E-14 8.99E-15
6 353 DeptB Male Accept 217.14539 5.3805671 0.0462014 217.14539 198.34621 237.72635 135.85461 9.2193238 8.4461135 11.531262 12.586906 12.032087 0.4635118 10.529199 0.0883403 2.196E-16 1.757E-15 4.393E-16 3.295E-16 2.196E-16 4.393E-16 -0.14429 -4.39E-16 0.4521955 -6.59E-16 -6.59E-16 -6.59E-16 1.5421589 1.989E-15 8.48E-15 6.464E-15 4.403E-15 2.968E-15 5.863E-15 -4.729733 -3.46E-15 2.0770196 -6.47E-15 -6.38E-15 -5.69E-15
7 8 DeptB Female Reject 15.30601 2.7282455 0.2003495 15.30601 10.335325 22.667301 -7.30601 -1.867451 -2.056977 -3.312462 -3.007258 -3.128479 0.6143822 1.108357 0.0041861 -2.25E-16 -0.75785 -1.62E-16 -1.86E-16 -1.72E-16 -2.19E-16 -0.006837 2.646E-16 0.7578499 2.197E-16 2.407E-16 2.294E-16 0.0730767 -2.03E-15 -3.657546 -2.38E-15 -2.48E-15 -2.33E-15 -2.92E-15 -0.224123 2.086E-15 3.4809481 2.157E-15 2.329E-15 1.982E-15
8 17 DeptB Female Accept 9.6939907 2.2715062 0.2008702 9.6939907 6.5391536 14.37089 7.3060093 2.3465452 2.1180239 2.7143924 3.0072581 2.8325522 0.3911414 0.4469052 0.0041861 1.457E-16 0.4799807 1.041E-16 1.197E-16 1.093E-16 1.405E-16 -0.006837 -1.72E-16 -0.479981 -1.41E-16 -1.56E-16 -1.46E-16 0.0730767 1.32E-15 2.3164901 1.531E-15 1.6E-15 1.477E-15 1.875E-15 -0.224123 -1.35E-15 -2.204642 -1.38E-15 -1.51E-15 -1.26E-15
9 205 DeptC Male Reject 198.97812 5.2931949 0.0567174 198.97812 178.04404 222.3736 6.0218769 0.426903 0.4247764 0.7080436 0.7115884 0.7103146 0.6400844 0.0692709 -0.003697 -7.26E-18 -1.45E-17 0 0 0 -7.26E-18 0.006038 1.451E-17 2.902E-17 0.0514811 7.256E-18 0 -0.064534 -6.57E-17 -7E-17 0 0 0 -9.68E-17 0.197922 1.144E-16 1.333E-16 0.5053783 7.022E-17 0
10 120 DeptC Male Accept 126.02188 4.8364555 0.0585301 126.02188 112.36343 141.34059 -6.021879 -0.536425 -0.540785 -0.717372 -0.711589 -0.714881 0.431723 0.029591 -0.003697 0 4.596E-18 0 0 0 4.596E-18 0.006038 -4.6E-18 -1.84E-17 -0.032605 0 0 -0.064534 0 2.218E-17 0 0 0 6.134E-17 0.1979221 -3.62E-17 -8.44E-17 -0.320079 0 0
11 391 DeptC Female Reject 363.05854 5.8945641 0.0427349 363.05854 333.88785 394.77779 27.941455 1.4664278 1.4481968 2.4948336 2.5262405 2.5157016 0.6630449 0.9659997 -0.018322 0 -7.19E-17 0.1398371 0 0 3.596E-17 0.0299254 0 7.192E-17 -0.139837 3.596E-17 0 -0.31984 0 -3.47E-16 2.0575648 0 0 4.8E-16 0.9809346 0 3.304E-16 -1.372749 3.48E-16 0
12 202 DeptC Female Accept 229.94146 5.4378247 0.0451131 229.94146 210.48294 251.19886 -27.94146 -1.84264 -1.881985 -2.580183 -2.526241 -2.555081 0.4679758 0.4318154 -0.018322 -2.28E-17 2.278E-17 -0.088565 0 0 -2.28E-17 0.0299254 2.278E-17 -4.56E-17 0.0885652 0 0 -0.31984 -2.06E-16 1.099E-16 -1.303149 0 0 -3.04E-16 0.9809347 1.796E-16 -2.09E-16 0.8694243 0 0
13 279 DeptD Male Reject 255.30424 5.5424559 0.0503787 255.30424 231.29997 281.79967 23.695762 1.4830018 1.4609054 2.4622378 2.4994795 2.4864328 0.6479663 0.8845534 -0.014872 8.757E-17 2.919E-17 1.168E-16 1.168E-16 1.168E-16 1.168E-16 0.0242913 -1.17E-16 -1.17E-16 -1.75E-16 0.1614174 -1.75E-16 -0.259622 7.931E-16 1.409E-16 1.718E-15 1.56E-15 1.578E-15 1.558E-15 0.7962501 -9.21E-16 -5.36E-16 -1.72E-15 1.5620689 -1.51E-15
14 138 DeptD Male Accept 161.69576 5.0857166 0.0524112 161.69576 145.91036 179.18892 -23.69576 -1.863466 -1.912007 -2.564589 -2.49948 -2.535876 0.444168 0.3840251 -0.014872 -7.4E-17 -3.7E-17 -7.4E-17 -7.4E-17 -7.4E-17 -7.4E-17 0.0242913 7.395E-17 7.395E-17 1.109E-16 -0.102233 1.109E-16 -0.259622 -6.7E-16 -1.78E-16 -1.09E-15 -9.88E-16 -9.99E-16 -9.87E-16 0.7962502 5.831E-16 3.397E-16 1.089E-15 -0.989329 9.585E-16
15 244 DeptD Female Reject 229.59014 5.4362957 0.0529774 229.59014 206.94685 254.71097 14.409858 0.9510056 0.941309 1.5784537 1.5947136 1.5889501 0.644368 0.3544502 -0.008952 -9.94E-17 -4.77E-17 -7.51E-17 0.1080507 -7.08E-17 -1.12E-16 0.0146225 1.37E-16 9.626E-17 1.1E-16 -0.108051 1.069E-16 -0.156284 -9E-16 -2.3E-16 -1.1E-15 1.4439895 -9.57E-16 -1.5E-15 0.479316 1.08E-15 4.422E-16 1.08E-15 -1.045628 9.236E-16
16 131 DeptD Female Accept 145.40986 4.9795564 0.0549138 145.40986 130.57234 161.93344 -14.40986 -1.194986 -1.215586 -1.622204 -1.594714 -1.610208 0.4384866 0.152763 -0.008953 5.565E-17 2.226E-17 4.452E-17 -0.068433 4.452E-17 6.678E-17 0.0146225 -7.79E-17 -5.56E-17 -6.68E-17 0.0684334 -6.68E-17 -0.156284 5.04E-16 1.074E-16 6.55E-16 -0.914544 6.015E-16 8.913E-16 0.4793161 -6.14E-16 -2.56E-16 -6.56E-16 0.6622441 -5.77E-16
17 138 DeptE Male Reject 116.93791 4.7616431 0.0733181 116.93791 101.28541 135.00933 21.062088 1.9477076 1.8932348 3.106604 3.1959883 3.1630862 0.6286041 1.3298634 -0.01253 -4.92E-17 -9.84E-17 -2.46E-17 -4.92E-17 -9.84E-17 -4.92E-17 0.0204658 4.919E-17 9.838E-17 4.919E-17 4.919E-17 0.2969142 -0.218736 -4.45E-16 -4.75E-16 -3.62E-16 -6.57E-16 -1.33E-15 -6.57E-16 0.6708529 3.878E-16 4.519E-16 4.829E-16 4.76E-16 2.5655562
18 53 DeptE Male Accept 74.062089 4.3049038 0.0747292 74.062089 63.971469 85.744365 -21.06209 -2.447392 -2.579791 -3.368885 -3.195988 -3.298475 0.4135965 0.5541756 -0.01253 1.558E-17 4.673E-17 1.558E-17 3.115E-17 6.231E-17 3.115E-17 0.0204658 -3.12E-17 -6.23E-17 -3.12E-17 -3.12E-17 -0.188049 -0.218736 1.411E-16 2.255E-16 2.292E-16 4.163E-16 8.419E-16 4.158E-16 0.6708529 -2.46E-16 -2.86E-16 -3.06E-16 -3.01E-16 -1.624883
19 299 DeptE Female Reject 240.61047 5.4831793 0.0518118 240.61047 217.37632 266.32799 58.389531 3.7642437 3.6255985 6.0928851 6.3258808 6.2443737 0.6459102 5.6150979 -0.036434 1.43E-16 7.151E-17 1.43E-16 1.43E-16 0.4195937 2.145E-16 0.0595093 -2.15E-16 -1.43E-16 -2.15E-16 -7.15E-17 -0.419594 -0.636029 1.295E-15 3.451E-16 2.105E-15 1.911E-15 5.669627 2.863E-15 1.9506733 -1.69E-15 -6.57E-16 -2.11E-15 -6.92E-16 -3.625598
20 94 DeptE Female Accept 152.38953 5.02644 0.0537902 152.38953 137.14149 169.33293 -58.38953 -4.72996 -5.093896 -6.812612 -6.325881 -6.602426 0.4409215 2.4276557 -0.036434 -1.36E-16 -9.06E-17 -9.06E-17 -9.06E-17 -0.265748 -1.36E-16 0.0595093 1.812E-16 9.059E-17 1.359E-16 9.059E-17 0.2657478 -0.636029 -1.23E-15 -4.37E-16 -1.33E-15 -1.21E-15 -3.590832 -1.81E-15 1.9506734 1.429E-15 4.161E-16 1.334E-15 8.766E-16 2.2962557
21 351 DeptF Male Reject 228.36589 5.4309491 0.0531121 228.36589 205.78898 253.41969 122.63411 8.1151336 7.5151464 12.598896 13.604754 13.255617 0.6441967 25.777847 -0.076153 -3.97E-16 1.921E-16 -4.89E-16 -9.59E-16 -4.53E-16 0.9240428 0.1243841 -0.924043 -0.924043 -0.924043 -0.924043 -0.924043 -1.329404 -3.6E-15 9.27E-16 -7.2E-15 -1.28E-14 -6.12E-15 12.333171 4.077222 -7.285899 -4.244304 -9.071119 -8.942148 -7.984408
22 22 DeptF Male Accept 144.63448 4.9742098 0.0550438 144.63448 129.84299 161.111 -122.6345 -10.1971 -12.744 -17.00283 -13.6048 -15.6051 0.4382161 11.106051 -0.076153 1.893E-16 -1.89E-16 2.84E-16 5.68E-16 2.84E-16 -0.58524 0.1243845 0.58524 0.58524 0.58524 0.58524 0.58524 -1.329408 1.715E-15 -9.14E-16 4.179E-15 7.591E-15 3.838E-15 -7.811181 4.0772344 4.6145047 2.6881185 5.745169 5.6634853 5.0569034
23 317 DeptF Female Reject 208.77408 5.3412527 0.05543 208.77408 187.28133 232.73339 108.22592 7.4901924 6.9525291 11.611039 12.508961 12.194621 0.6414552 21.533862 0.8184913 -0.885183 -0.885183 -0.885183 -0.885183 -0.885183 -0.885183 0.1089309 0.8851832 0.8851832 0.8851832 0.8851832 0.8851832 14.288419 -8.016767 -4.272084 -13.0246 -11.82959 -11.96076 -11.81451 3.5706788 6.9794986 4.0658143 8.6896432 8.5660954 7.6486325
24 24 DeptF Female Accept 132.22611 4.8845134 0.0572835 132.22611 118.18364 147.93708 -108.2261 -9.411816 -11.59923 -15.41621 -12.50898 -14.22795 0.4338873 9.225178 -0.62732 0.5606277 0.5606277 0.5606277 0.5606277 0.5606277 0.5606277 0.1089311 -0.560628 -0.560628 -0.560628 -0.560628 -0.560628 -10.95113 5.0773916 2.7057097 8.2490839 7.4922269 7.5753036 7.4826807 3.5706851 -4.420441 -2.575069 -5.503555 -5.425307 -4.844235

In R, one way to fit the (A, DS) model is with the following command:

berk.jnt = glm(Freq~Admit+Gender+Dept+Gender*Dept, family=poisson(), data=berk.data)

This model implies that association between D and S does NOT depend on level of the variable A. That is, the association between department and sex is independent of the rejection/acceptance decision.

\[ \theta_{DS(A=reject)}=\theta_{DS(A=accept)}=\dfrac{\text{exp}(\lambda_{ij}^{DS})\text{exp}(\lambda_{i'j'}^{DS})}{\text{exp}(\lambda_{i'j}^{DS})\text{exp}(\lambda_{ij'}^{DS})} \]

Since we are assuming that the (DS) distribution is independent of A, then we are assuming that the conditional distribution of DS, given A, is the same as the unconditional distribution of DS. And equivalently, the conditional distribution of A, given DS, is the same as the unconditional distribution of A. In other words, if this model fits well, neither department nor sex tells us anything significant about admission status.

The first estimated coefficient 1.9436 for the DS associations implies that the estimated odds ratio between sex and department (specifically, A versus F) is \(\exp(1.9436) = 6.98\) with 95% CI

\[ (\exp(1.695), \exp(2.192))= (5.45, 8.96) \]

summary(berk.jnt)

Call:
glm(formula = Freq ~ Admit + Gender + Dept + Gender * Dept, family = poisson(), 
    data = berk.data)

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)       4.88451    0.05728  85.269  < 2e-16 ***
AdmitRejected     0.45674    0.03051  14.972  < 2e-16 ***
GenderMale        0.08970    0.07492   1.197   0.2312    
DeptA            -1.14975    0.11042 -10.413  < 2e-16 ***
DeptB            -2.61301    0.20720 -12.611  < 2e-16 ***
DeptC             0.55331    0.06796   8.141 3.91e-16 ***
DeptD             0.09504    0.07483   1.270   0.2040    
DeptE             0.14193    0.07401   1.918   0.0551 .  
GenderMale:DeptA  1.94356    0.12683  15.325  < 2e-16 ***
GenderMale:DeptB  3.01937    0.21771  13.869  < 2e-16 ***
GenderMale:DeptC -0.69107    0.10187  -6.784 1.17e-11 ***
GenderMale:DeptD  0.01646    0.10334   0.159   0.8734    
GenderMale:DeptE -0.81123    0.11573  -7.010 2.39e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2650.10  on 23  degrees of freedom
Residual deviance:  877.06  on 11  degrees of freedom
AIC: 1062.1

Number of Fisher Scoring iterations: 5

But, we should really check the overall fit of the model first, to determine if these estimates are meaningful.

Model Fit

The goodness-of-fit statistic (in the output as “Residual deviance”) 877.06 indicates that the model does not fit since the “Value/df” is much larger than 1.

How did we get these degrees of freedom? As usual, we’re comparing this model to the saturated model, and the df is the difference in the numbers of parameters involved:

\[ DF = (IJK-1) - [(I-1)+(J-1)+(K-1)+(J-1)(K-1)] = (I-1)(JK-1) \]

With \(I=2\), \(J=2\), and \(K=6\) corresponding to the levels of A, S, and D, respectively, this works out to be 11.

As before, we can look at the residuals for more information about why this model fits poorly. Recall that adjusted residuals have approximately a N(0, 1) distribution. In general, we have a lack of fit if (1) we have a large number of cells and adjusted residuals are greater than 3, or (2) we have a small number of cells and adjusted residuals are greater than 2. Here is only part of the output. Notice that the absolute value of the standardized residual for the first six cells are all large (e.g., in cell 1, the value is \(-15.18\)). Many other such residuals are rather large as well, indicating that this model fits poorly.

Evaluate the residuals

fits = fitted(berk.jnt)
resids = residuals(berk.jnt,type="pearson")
h = lm.influence(berk.jnt)$hat
adjresids = resids/sqrt(1-h)
round(cbind(berk.data$Freq,fits,adjresids),2)
        fits adjresids
1 512 319.90     15.18
2 313 505.10    -15.18
3  89  41.88      9.42
4  19  66.12     -9.42
5 353 217.15     12.59
6 207 342.85    -12.59
...

Next, let us look at what is often the most interesting model of conditional independence.

10.2.4 Conditional Independence

With three variables, there are three ways to satisfy conditional independence. The assumption is that two variables are independent, given the third. For example, if \(A\) and \(B\) are conditionally independent, given \(C\), which we denote by \((AC, BC)\), then the conditional distribution of \((AB)\), given \(C\), can be factored into the product of the two conditional marginals, given \(C\).

Main assumptions

(these are stated for the case \((AC,BC)\) but hold in a similar way for \((AB, BC)\) and \((AB, AC)\) as well):

  • The \(N = IJK\) counts in the cells are assumed to be independent observations of a Poisson random variable, and
  • there is no partial interaction between \(A\) and \(B\): \(\lambda_{ij}^{AB} =0\), for all \(i, j\), and
  • there is no three-way interaction: \(\lambda_{ijk}^{ABC}=0\) for all \(i, j, k\).

Note the constraints above are in addition to the usual set-to-zero or sum-to-zero constraints (present even in the saturated model) imposed to avoid overparameterization.

Model Structure

\[ \log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C +\lambda_{ik}^{AC}+\lambda_{jk}^{BC} \]

In the example below, we consider the model where admission status and department are conditionally independent, given sex, which we denote by \((AS, DS).\)

In SAS, this model can be fitted as:

proc genmod data=berkeley order=data;
class D S A;
model count = D S A A*S D*S / dist=poisson link=log lrci type3 obstats;
run;

Model Fit:

The goodness-of-fit statistics indicate that the model does not fit.

The SAS System

The GENMOD Procedure

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 10 783.6070 78.3607
Scaled Deviance 10 783.6070 78.3607
Pearson Chi-Square 10 715.2958 71.5296
Scaled Pearson X2 10 715.2958 71.5296
Log Likelihood   20121.4021  
Full Log Likelihood   -471.3333  
AIC (smaller is better)   970.6667  
AICC (smaller is better)   1017.3334  
BIC (smaller is better)   987.1595  

How did we get these DF?

\[ DF = (IJK-1) - [(I-1)+(J-1)+(K-1)+(I-1)(J-1)+(J-1)(K-1)]=J(I-1)(K-1) \]

With \(I=2\), \(J=2\), and \(K=6\) corresponding to the levels of A, S, and D, respectively, this works out to be 10.

So where is the lack of fit? As before, we look at residuals. For example, the adjusted residual for the first cell is \(-12.1747\), a great deviation from zero.

We can also evaluate overall the individual parameters and their significance:

The SAS System

The GENMOD Procedure

LR Statistics For Type 3 Analysis
Source DF Chi-Square Pr > ChiSq
D 5 313.04 <.0001
S 1 333.24 <.0001
A 1 283.32 <.0001
S*A 1 93.45 <.0001
D*S 5 1220.61 <.0001

This is like an ANOVA table in ANOVA and regression models. All parameters are significantly different from zero. That is they contribute significantly in describing the relationships between our variables, but the overall lack of fit of the model suggests that they are not sufficient.

In R, the (AS, DS) model can be fitted with

berk.cnd = glm(Freq~Admit+Gender+Dept+Admit*Gender+Dept*Gender, family=poisson(), data=berk.data)

Model Fit:

The goodness-of-fit statistics indicate that the model does not fit, e.g., Residual deviance: 783.6 on 10 degrees of freedom

How did we get these DF?

\[ DF = (IJK-1) - [(I-1)+(J-1)+(K-1)+(I-1)(J-1)+(J-1)(K-1)]=J(I-1)(K-1) \]

With \(I=2\), \(J=2\), and \(K=6\) corresponding to the levels of A, S, and D, respectively, this works out to be 10.

So where is the lack of fit? As before, we look at residuals. For example, the adjusted residual for the first cell is \(-12.17\), a great deviation from zero.

fits = fitted(berk.cnd)
resids = residuals(berk.cnd,type="pearson")
h = lm.influence(berk.cnd)$hat
adjresids = resids/sqrt(1-h)
round(cbind(berk.data$Freq,fits,adjresids),2)
        fits adjresids
1 512 367.28     12.17
2 313 457.72    -12.17
3  89  32.78     12.13
4  19  75.22    -12.13
5 353 249.31      9.91
6 207 310.69     -9.91
...

We can also evaluate overall the individual parameters and their significance. However, the ANOVA table R reports for glm-fitted models uses the sequential sum of squares, which depend on the order terms are added to the model.

anova(berk.cnd, test="LR")
Analysis of Deviance Table

Model: poisson, link: log

Response: Freq

Terms added sequentially (first to last)

             Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                            23    2650.10              
Admit         1   230.03        22    2420.07 < 2.2e-16 ***
Gender        1   162.87        21    2257.19 < 2.2e-16 ***
Dept          5   159.52        16    2097.67 < 2.2e-16 ***
Admit:Gender  1    93.45        15    2004.22 < 2.2e-16 ***
Gender:Dept   5  1220.61        10     783.61 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is like an ANOVA table in ANOVA and regression models. All parameters are significantly different from zero. That is they contribute significantly in describing the relationships between our variables, but the overall lack of fit of the model suggests that they are not sufficient.

10.2.5 Homogeneous Association

The homogeneous associations model is also known as a model of no-3-way interactions. Denoted by (AB, AC, BC), the only restriction this model imposes over the saturated model is that each pairwise conditional association doesn’t depend on the value the third variable is fixed at. For example, the conditional odds ratio between \(A\) and \(B\), given \(C\) is fixed at its first level, must be the same as the conditional odds ratio between \(A\) and \(B\), given \(C\) is fixed at its second level, and so on.

Main assumptions

  • The N = IJK counts in the cells are assumed to be independent observations of a Poisson random variable, and
  • there is no three-way interaction among the variables: \(\lambda_{ijk}^{ABC}=0\) for all \(i, j, k\).

Model Structure

\[ \log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C+\lambda_{ij}^{AB}+\lambda_{jk}^{BC}+\lambda_{ik}^{AC} \]

In terms of the Berkeley example, this model implies that the conditional association between department and sex does not depend on the fixed value of admission status, that the conditional association between sex and admission status does not depend on the fixed value of department, and the conditional association between department and admission status does not depend on the fixed value of sex.

Does this model fit? Even this model doesn’t fit well, but it seems to fit better than the previous models. The deviance statistic is \(G^2= 20.2251\) with df= 5, and the Value/df is 4.0450.

Stop and Think!

Can you figure out DF?

Since the only terms that separate this model from the saturated one are those for the three-way interactions, the degrees of freedom must be \((I-1)(J-1)(K-1)\), which is \((2-1)(2-1)(6-1)=5\) in this example.

In SAS, this model can be fitted as:

proc genmod data=berkeley order=data;
class D S A;
model count = D S A D*S D*A S*A / dist=poisson link=log lrci type3 obstats;
run;

Are all terms in the model significant (e.g. look at “Type 3 Analysis output”); recall we need to use option type3. For example, here is the ANOVA-like table that shows that SA association does not seem to be significant,

The SAS System

The GENMOD Procedure

LR Statistics For Type 3 Analysis
Source DF Chi-Square Pr > ChiSq
D 5 360.23 <.0001
S 1 252.58 <.0001
A 1 303.43 <.0001
D*S 5 1128.70 <.0001
D*A 5 763.40 <.0001
S*A 1 1.53 0.2159

Here is part of the output from the “Analysis of Parameter Estimates” given the values for all the parameters,

The SAS System

The GENMOD Procedure

Analysis Of Maximum Likelihood Parameter Estimates
Parameter   DF Estimate Standard
Error
Likelihood Ratio 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept     1 3.1374 0.1567 2.8166 3.4322 400.74 <.0001
D DeptA   1 1.1356 0.1820 0.7858 1.5007 38.94 <.0001
D DeptB   1 -0.3425 0.2533 -0.8525 0.1450 1.83 0.1763
D DeptC   1 2.2228 0.1649 1.9102 2.5579 181.77 <.0001
D DeptD   1 1.7439 0.1682 1.4241 2.0848 107.52 <.0001
D DeptE   1 1.4809 0.1762 1.1439 1.8361 70.64 <.0001
D DeptF   0 0.0000 0.0000 0.0000 0.0000 . .
S Male   1 -0.0037 0.1065 -0.2126 0.2048 0.00 0.9720
S Female   0 0.0000 0.0000 0.0000 0.0000 . .
A Reject   1 2.6246 0.1577 2.3270 2.9467 276.88 <.0001
A Accept   0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptA Male 1 2.0023 0.1357 1.7394 2.2717 217.68 <.0001
D*S DeptA Female 0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptB Male 1 3.0771 0.2229 2.6595 3.5364 190.63 <.0001
D*S DeptB Female 0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptC Male 1 -0.6628 0.1044 -0.8679 -0.4587 40.34 <.0001
D*S DeptC Female 0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptD Male 1 0.0440 0.1057 -0.1633 0.2513 0.17 0.6774
D*S DeptD Female 0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptE Male 1 -0.7929 0.1167 -1.0226 -0.5652 46.19 <.0001
D*S DeptE Female 0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptF Male 0 0.0000 0.0000 0.0000 0.0000 . .
D*S DeptF Female 0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptA Reject 1 -3.3065 0.1700 -3.6510 -2.9834 378.38 <.0001
D*A DeptA Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptB Reject 1 -3.2631 0.1788 -3.6240 -2.9220 333.12 <.0001
D*A DeptB Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptC Reject 1 -2.0439 0.1679 -2.3842 -1.7247 148.24 <.0001
D*A DeptC Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptD Reject 1 -2.0119 0.1699 -2.3559 -1.6884 140.18 <.0001
D*A DeptD Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptE Reject 1 -1.5672 0.1804 -1.9300 -1.2213 75.44 <.0001
D*A DeptE Accept 0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptF Reject 0 0.0000 0.0000 0.0000 0.0000 . .
D*A DeptF Accept 0 0.0000 0.0000 0.0000 0.0000 . .
S*A Male Reject 1 0.0999 0.0808 -0.0582 0.2588 1.53 0.2167
S*A Male Accept 0 0.0000 0.0000 0.0000 0.0000 . .
S*A Female Reject 0 0.0000 0.0000 0.0000 0.0000 . .
S*A Female Accept 0 0.0000 0.0000 0.0000 0.0000 . .
Scale     0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

Recall, we are interested in the highest-order terms, thus two-way associations here, and they correspond to log-odds ratios. For example, the first coefficient 0.0999, reported in the row beginning with “S*A, male, reject”, is the conditional log-odds ratio between sex and admission status. The interpretation is as follows: for a fixed department, the odds a male is rejected is \(\exp(0.0999)=1.10506\) times the odds that a female is rejected.

Although the department is fixed for this interpretation (so the comparison is among individuals applying to the same department), it doesn’t matter which department we’re focusing on; they all lead to the same result under this model. However, this model does not fit well, so we can’t really rely on the inferences based on this model.

In R, here is one way of fitting this model (note that this syntax will also include all first-order terms automatically):

berk.ha = glm(Freq~(Admit+Gender+Dept)^2, family=poisson(), data=berk.data)

Here is part of the summary output that gives values for all the parameter estimates:

summary(berk.ha)

Call:
glm(formula = Freq ~ (Admit + Gender + Dept)^2, family = poisson(), 
    data = berk.data)

Coefficients:
                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)               3.137358   0.156723  20.019  < 2e-16 ***
AdmitRejected             2.624559   0.157728  16.640  < 2e-16 ***
GenderMale               -0.003731   0.106460  -0.035    0.972    
DeptA                     1.135552   0.181963   6.241 4.36e-10 ***
DeptB                    -0.342489   0.253251  -1.352    0.176    
DeptC                     2.222782   0.164869  13.482  < 2e-16 ***
DeptD                     1.743872   0.168181  10.369  < 2e-16 ***
DeptE                     1.480918   0.176194   8.405  < 2e-16 ***
AdmitRejected:GenderMale  0.099870   0.080846   1.235    0.217    
AdmitRejected:DeptA      -3.306480   0.169982 -19.452  < 2e-16 ***
AdmitRejected:DeptB      -3.263082   0.178784 -18.252  < 2e-16 ***
AdmitRejected:DeptC      -2.043882   0.167868 -12.176  < 2e-16 ***
AdmitRejected:DeptD      -2.011874   0.169925 -11.840  < 2e-16 ***
AdmitRejected:DeptE      -1.567174   0.180436  -8.685  < 2e-16 ***
GenderMale:DeptA          2.002319   0.135713  14.754  < 2e-16 ***
GenderMale:DeptB          3.077140   0.222869  13.807  < 2e-16 ***
GenderMale:DeptC         -0.662814   0.104357  -6.351 2.13e-10 ***
GenderMale:DeptD          0.043995   0.105736   0.416    0.677    
GenderMale:DeptE         -0.792867   0.116664  -6.796 1.07e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2650.095  on 23  degrees of freedom
Residual deviance:   20.204  on  5  degrees of freedom
AIC: 217.26

Number of Fisher Scoring iterations: 4

Recall, we are interested in the highest-order terms, thus two-way associations here, and they correspond to log-odds ratios. For example, the first coefficient 0.0999, reported in the row beginning with “AdmitRejected:GenderMale”, is the conditional log-odds ratio between sex and admission status. The interpretation is as follows: for a fixed department, the odds a male is rejected is \(\exp(0.0999)=1.10506\) times the odds that a female is rejected.

Although the department is fixed for this interpretation (so the comparison is among individuals applying to the same department), it doesn’t matter which department we’re focusing on; they all lead to the same result under this model. However, this model does not fit well (deviance statistic of 20.204 with 5 df), so we can’t really rely on the inferences based on this model.

10.2.6 Model Selection

According to our usual approach for hierarchical models, where one is a special case of the other, we can use a likelihood ratio test to measure the reduction in fit of the smaller model (null hypothesis), relative to the larger model (alternative hypothesis). The degrees of freedom for these tests are the difference between the numbers of parameters involved between the two models.

Below is a summary of all possible models for the Berkeley admissions example data, with complete independence (most restrictive) at the top the saturated model at the bottom. The df, \(G^2\), and p-value columns correspond to the deviance (goodness-of-fit) test for each model compared with the saturated model.

Model df \(G^2\) p-value
(D, S, A) 16 2097.671 < .001
(DS, A) 11 877.056 < .001
(D, SA) 15 2004.222 < .001
(DA, S) 11 1242.350 < .001
(DS, SA) 10 783.607 < .001
(DS, DA) 6 21.736 < .001
(DA, SA) 10 1148.901 < .001
(DS, DA, SA) 5 20.204 < .001
(DSA) 0 0.00

Based on these results, the saturated model would be preferred because any reduced model has a significantly worse fit (all p-values are significant). However, if a reduced model would have been acceptable, relative to the saturated one, we could continue to test further reductions with likelihood ratio tests.

Likelihood Ratio Tests

Suppose the model of homogeneous association (HA) had been insignificantly different from the saturated model. We would then have preferred the HA model because it has fewer parameters and is easier to work with overall. And to test additional reductions, we could use the likelihood ratio test with the HA model as the alternative hypothesis (instead of the saturated one).

For example, to test the (DS, DA) model, which assumes sex and admission status are conditionally independent, given department, the hypotheses would be \(H_0\): (DS, DA) versus \(H_a\): (DS, DA, SA). The test statistic would be twice the difference between their log-likelihood values but could be computed directly from the deviance statistics above:

\[ G^2=783.607-20.204 =763.4 \]

Relative to a chi-square distribution with \(10-5=5\) degrees of freedom, this is would highly significant (p-value less than .001). If, however, this conditional independence model hadn’t been rejected, we could place into the role of the alternative hypothesis to test the further reduced joint independence model (DS, A), and so on. As we’ve seen with our earlier models, this approach works for any full versus reduced model comparison.

10.3 Lesson Summary

In this lesson, we introduced the log-linear model and showed how it can be used to explain the associations among a set of two or more categorical variables. Although many of the same results as those from a logistic model can be obtained, the main advantage of the log-linear model is that by not treating anyone variable as the response, it can estimate relationships among any of the variables, whereas the logistic model focuses only on relationships involving the response. The advantage of the logistic model, however, is that it can accommodate quantitative variables.

In the next lesson, we continue the discussion on log-linear models to include ordinal data and “dependent” data, which is effectively the categorical version of matched pairs.