10.1 - Log-Linear Models for Two-way Tables
10.1 - Log-Linear Models for Two-way TablesOverview
Recall, that a two-way ANOVA models the expected value of a continuous variable (e.g., plant length) depending on the levels of two categorical variables (e.g., low/high sunlight and low/high water amount). In contrast, the log-linear model expresses the cell counts (e.g., the number of plants in a cell) depending on the levels of two categorical variables.
Let \(\mu_{ij}\) be the expected counts, \(E(n_{ij})\), in an \(I \times J\) table, created by two random variables \(A\) and \(B\).
Objective
Model the cell counts: \(\mu_{ij} = n\pi_{ij}\)
Model structure
An analogous saturated log-linear model to two-way ANOVA with interaction is
\(\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_{ij}^{AB}\)
where \(i = 1,\ldots, I, j = 1, \ldots, J\), are levels of categorical random variables \(A\) and \(B\), with constraints: \(\sum_i \lambda_i = \sum_j \lambda_j = \sum_i \sum_j \lambda_{ij} = 0\), to deal with overparametrization. Overparametrization means that the number of parameters is more than what can be uniquely estimated. This model is over-parametrized because term \(\lambda_{ij}\) already has \(I \times J\) parameters corresponding to the cell means \(\mu_{ij}\). The constant, \(\lambda\), and the "main effects", \(\lambda_i\) and \(\lambda_j\) give us additional \(1 + I + J\) parameters. Superscripts denote variables \(A\) and \(B\). We will see more on this in the next sections.
Model Assumptions
The \(N = I \times J\) counts in the cells are assumed to be independent observations from a Poisson random variable, \(n_{ij}\sim \text{Poisson}(\mu_{ij})\). The log-linear modeling is natural for Poisson, Multinomial and Product-Multinomial sampling like we have discussed in earlier lectures.
Recall the Vitamin C study, a \(2 \times 2\) example from Lesson 3. Are the type of treatment and contracting cold independent? If there are associated, in which way are they associated?
We already know how to answer the above questions via the chi-square test of independence, but now we want to model the cell counts with the log-linear model of independence and ask if this model fits well.
Log-linear Models for Two-Way Tables
Given two categorical random variables, \(A\) and \(B\), there are two main types of models we will consider:
- Independence model (A, B)
- Saturated model (AB)
Let us start with the model of independence.
10.1.1 - Model of Independence
10.1.1 - Model of IndependenceRecall, that the independence can be stated in terms of cell probabilities as a product of marginal probabilities,
\(\pi_{ij}=\pi_{i+}\pi_{+j}\quad i = 1, \ldots, I, j = 1, \ldots, J\)
and in terms of cell frequencies,
\(\mu_{ij}=n\pi_{ij}=n\pi_{i+}\pi_{+j}\quad i = 1, \ldots, I, j = 1, \ldots, J\)
By taking natural logarithms on both sides of "=" sign, we obtain the loglinear model of independence:
\(\log(\mu_{ij}) = \lambda+\lambda_i^A+\lambda_j^B\)
where superscripts A and B are just used to denote the two categorical variables.
This is an ANOVA type-representation where:
- \(\lambda\) represents the "overall" effect, or the grand mean of the logarithms of the expected counts, and it ensures that \(\sum_i \sum_j \mu_{ij} = n\), that is, the expected cell counts under the fitted model add up to the total sample size \(n\).
- \(\lambda^A_I\) represents the "main" effect of variable A, or a deviation from the grand mean, and it ensures that \(\sum_j \mu_{ij} = n_{i+}\), that is, the marginal totals under the fitted model add up to the observed marginal counts. It represents the effect of classification in row \(i\).
- \(\lambda^B_J\) represents the "main" effect of variable B, or a deviation from the grand mean, and it ensures that \(\sum_i \mu_{ij} = n+j\). This is the effect of classification in column j.
- \(\lambda^A_I=\lambda^B_J=0\), or alternatively, \(\sum_i \lambda^{A}_{i} = \sum_j \lambda^{B}_{j} = 0\), to deal with over-parametrization (see below).
The maximum likelihood (ML) fitted values for the cell counts are the same as the expected (fitted) values under the test of independence in two-way tables, i.e., \(E(\mu_{ij}) = n_{i+}n_{+j}/n\). Thus, the \(X^2\) and \(G^2\) for the test of independence are goodness-of-fit statistics for the log-linear model of independence testing that the independence model holds versus that it does not, or more specifically testing that the independence model is true vs. saturated model is true. This model also implies that ALL odds ratios should be equal to 1.
Parameter Constraints & Uniqueness
For an \(I\times J\) table, and the model is
\(\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B\)
There are \(I\) terms in the set {\(\lambda^A_I\)}, but one of them is redundant, so there are \(I − 1\) unknown parameters, e.g., {\(\lambda_{1}^{A}, \ldots , \lambda_{I-1}^{A}\)}, and there are \(J\) terms in the set {\(\lambda^B_J\)}, and one is redundant, so there are \(J − 1\) unknown parameters, e.g., {\(\lambda_1^B , \ldots, \lambda_{J-1}^B\)}. (Why is one of them redundant?) There can be many different parameterizations, but regardless of which set we use, we need to set the constraints to account for redundant parameters. Nonexistence of a unique set of parameters does not mean that the expected cell counts will change depending on which set of parameters is being used. It simply means that the estimates of the effects may be obtained under different sets of constraints, which will lead to different interpretations. But expected cell counts will remain the same.
DUMMY CODING: To avoid over-parametrization, one member in the set of \(\lambda\)s is fixed to have a constant value, typically 0. This corresponds to using dummy coding for the categorical variables (e.g. A = 1, 0). By default, in SAS PROC GENMOD, the last level is set to 0. So, we have
\(\log(\mu_{11})=\lambda+\lambda_1^A+\lambda_1^B\)
\(\log(\mu_{22})=\lambda+0+0=\lambda\)
By default, in R glm() the first level of the categorical variable is set to 0. So, we have
\(\log(\mu_{11})=\lambda+0+0=\lambda\)
\(\log(\mu_{22})=\lambda+\lambda_2^A+\lambda_2^B\)
ANOVA-type CODING: Another way to avoid over-parametrization is to fix the sum of the terms equal to a constant, typically 0. That is the ANOVA-type constraint. This corresponds to using the so-called "effect” coding for categorical variables (e.g. A = 1, 0, −1). By default, SAS PROC CATMOD and R loglin(), use the zero-sum constraint, e.g., the expected cell count in the first cell and the last cell,
\(\log(\mu_{11})=\lambda+\lambda_1^A+\lambda_1^B\)
\(\log(\mu_{22})=\lambda-\lambda_1^A-\lambda_1^B\)
We will see more on these with a specific example in the next section.
Link to odds and odds ratio
We can have different parameter estimates (i.e, different values of \(\lambda\)s) depending on the type of constraints we set. So, what is unique about these parameters that lead to the same inference, regardless of parametrization? The differences, that is the log odds, are unique:
\(\lambda_i^A-\lambda_{i'}^A\)
\(\lambda_j^B-\lambda_{j'}^B\)
where the subscript \(i\) denotes one level of categorical variable \(A\) and "\(i\)" denotes another level of the same variable; similarly for \(B\).
Thus the odds is also unique!
\begin{align} \log(odds) &= \log\left(\dfrac{\mu_{i1}}{\mu_{i2}}\right)=\log(\mu_{i1})-\log(\mu_{i2})\\ &= (\lambda+\lambda_i^A+\lambda_1^B)-(\lambda+\lambda_i^A+\lambda_2^B)=\lambda_1^B-\lambda_2^B\\ \end{align}
\(\Rightarrow \mbox{odds} = \exp(\lambda_{1}^{B} − \lambda_{2}^{B})\)
If we have the model of independence, we expect that the log(odds ratio) is 0, that is, that the odds ratio is 1. Can you finish the calculation below and show that you get zero?
\begin{align}\log(oddsratio) &= \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\&= \log(\mu_{11})+\log(\mu_{22})-\log(\mu_{12})-\log(\mu_{21})\\&= \cdots\\\end{align}
\(\begin{align*}\log(oddsratio)& = \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\ &= \log(\mu_{11})+\log(\mu_{22})-\log(\mu_{12})+\log(\mu_{21})\\ &= \lambda+\lambda_1^A+\lambda_1^B+\lambda+\lambda_2^A+\lambda_2^B-\lambda-\lambda_1^A-\lambda_2^B-\lambda-\lambda_2^A-\lambda_1^B\\ &= 0\\ \end{align*}\)
The odds ratio measures the strength of the association and depends only on the interaction terms {\(\lambda_{ij}^{AB}\) }, which clearly does not appear in this model, but we will discuss it when we see the saturated log-linear model.
Do you recall, how many odds ratios do we need to completely characterize associations in \(I\times J\) tables?
10.1.2 - Example: Therapeutic Value of Vitamin C
10.1.2 - Example: Therapeutic Value of Vitamin CExample: Vitamin C Study - Revisited
Recall the Vitamin C study, a \(2 \times 2\) example from Lesson 3. From the earlier analysis, we learned that the treatment and response for this data are not independent:
\(G^2 = 4.87\), df = 1, p−value = 0.023,
And from the estimated odds ratio of 0.49, we noted that the odds of getting a cold were about twice as high for those who took the placebo, compared with those who took vitamin C. Next, we will consider the same question by fitting the log-linear model of independence.
There are (at least) two ways of fitting these models in SAS and R. First, we will see the specification for two procedures in SAS (PROC CATMOD and PROC GENMOD); see VitaminCLoglin.sas and VitaminCLoglin.R.
The PROC CATMOD procedure
INPUT:
proc catmod data=ski order=data;
weight count;
model treatment*response=_response_;
loglin treatment response;
run;
The MODEL statement (line) specifies that we are fitting the model on two variables treatment and response specified on the left-hand side of the MODEL statement. The MODEL statement only takes independent variables on its right-hand side. In the case of fitting a log-linear model, we need to use a keyword _response_ on the right-hand side.
The LOGLIN statement specifies that it's a model of independence because we are stating that the model has only main effects, one for the "treatment" variable and the other for the "response" variable, in this case.
OUTPUT:
Population Profiles and Response profiles
The responses we're modeling are four cell counts:
Population Profiles | |
---|---|
Sample | Sample Size |
1 | 279 |
Response Profiles | ||
---|---|---|
Response | treatment | response |
1 | placebo | cold |
2 | placebo | nocold |
3 | absorbic | cold |
4 | absorbic | nocold |
Maximum Likelihood Analysis of Variance
As with ANOVA, we are breaking down the variability across factors. More specifically, the main effect TREATMENT tests if the subjects are evenly distributed over the two levels of this variable. The null hypothesis assumes that they are, and the alternative assumes that they are not. Based on the output below, are the subjects evenly distributed across the placebo and the vitamin C conditions?
Maximum Likelihood Analysis of Variance | |||
---|---|---|---|
Source | DF | Chi-Square | Pr > ChiSq |
treatment | 1 | 0.00 | 0.9523 |
response | 1 | 98.12 | <.0001 |
Likelihood Ratio | 1 | 4.87 | 0.0273 |
The main effect RESPONSE tests if the subjects are evenly distributed over the two levels of this variable. The Wald statistic \(X^2=98.12\) with df = 1 indicates highly uneven distribution.
The LIKELIHOOD RATIO (or the deviance statistic) tells us about the overall fit of the model. Does this value look familiar? Does the model of independence fit well?
Analysis of Maximum Likelihood Estimates
Gives the estimates of the parameters. We focus on the estimation of odds.
Analysis of Maximum Likelihood Estimates | |||||
---|---|---|---|---|---|
Parameter | Estimate | Standard | Chi- | Pr > ChiSq | |
treatment | placebo | 0.00358 | 0.0599 | 0.00 | 0.9523 |
response | cold | -0.7856 | 0.0793 | 98.12 | <.0001 |
For example,
\(\lambda_1^A=\lambda_{placebo}^{TREATMENT}=0.00358=-\lambda_{ascobic}^{TREATMENT}=-\lambda_2^A\)
What are the odds of getting cold? Recall that PROC CATMOD, by default, uses zero-sum constraints. Based on this output the log odds are
\begin{align}
\text{log}(\mu_{i1}/\mu_{i2}) &=\text{log}(\mu_{i1})-\text{log}(\mu_{i2})\\
&=[\lambda+\lambda_{placebo}^{TREATMENT}+\lambda_{cold}^{RESPONSE}]-[\lambda+\lambda_{placebo}^{TREATMENT}+\lambda_{nocold}^{RESPONSE}]\\
&=\lambda_{cold}^{RESPONSE}-\lambda_{nocold}^{RESPONSE}\\
&=\lambda_{cold}^{RESPONSE}+\lambda_{cold}^{RESPONSE}\\
&=-0.7856-0.7856=-1.5712\\
\end{align}So the odds are \(\exp(-1.5712) = 0.208\). Notice that the odds are the same for all rows.
Think about it!
What are the odds of being in the placebo group? Note that this question is not really relevant here, but it's a good exercise to figure out the model parameterization and how we come up with estimates of the odds. Hint: log(odds)=0.00716.
The PROC GENMOD procedure
Fits a log-linear model as a Generalized Linear Model (GLM) and by default uses indicator variable coding. We will mostly use PROC GENMOD in this course.
INPUT:
proc genmod data=ski order=data;
class treatment response;
model count = response*treatment /link=log dist=poisson lrci type3 obstats;
run;
The CLASS statement (line) specifies that we are dealing with two categorical variables: treatment and response.
The MODEL statement (line) specifies that our responses are the cell counts, and link=log specifies that we are modeling the log of counts. The random distribution is Poisson, and the remaining specifications ask for different parts of the output to be printed.
OUTPUT:
Model Information
Gives assumptions about the model. In our case, tells us that we are modeling the log, i.e., the LINK FUNCTION, of the responses, i.e. DEPENDENT VARIABLE, that is counts which have a Poisson distribution.
Model Information | |
---|---|
Data Set | WORK.SKI |
Distribution | Poisson |
Link Function | Log |
Dependent Variable | count |
Number of Observations Read | 4 |
---|---|
Number of Observations Used | 4 |
Class Level Information
Information on the categorical variables in our model
Class Level Information | ||
---|---|---|
Class | Levels | Values |
treatment | 2 | placebo absorbic |
response | 2 | cold nocold |
Criteria For Assessing Goodness of Fit
Gives the information on the convergence of the algorithm for MLE for the parameters, and how well the model fits. What is the value of the deviance statistic \(G^2\)? What about the Pearson \(X^2\)? Note that they are the same as what we got when we did the chi-square test of independence; we will discuss "scaled" options later with the concept of overdispersion. Note also the value of the log-likelihood for the fitted model, which in this case is independence.
Criteria For Assessing Goodness Of Fit | |||
---|---|---|---|
Criterion | DF | Value | Value/DF |
Deviance | 1 | 4.8717 | 4.8717 |
Scaled Deviance | 1 | 4.8717 | 4.8717 |
Pearson Chi-Square | 1 | 4.8114 | 4.8114 |
Scaled Pearson X2 | 1 | 4.8114 | 4.8114 |
Log Likelihood | 970.6299 |
Analysis of Parameter Estimates
Gives individual parameter estimates. Recall that PROC GENMOD does not use zero-sum constraints but fixes typically the last level of each variable to be equal to zero:
\(\hat{\lambda}=4.75,\ \hat{\lambda}^A_1=0.0072,\ \hat{\lambda}^A_2=0,\ \hat{\lambda}^B_1=-1.5712,\ \hat{\lambda}^B_2=0\)
Analysis Of Maximum Likelihood Parameter Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard | Likelihood Ratio 95% Confidence Limits | Wald Chi-Square | Pr > ChiSq | ||
Intercept | 1 | 4.7457 | 0.0891 | 4.5663 | 4.9158 | 2836.81 | <.0001 | |
treatment | placebo | 1 | 0.0072 | 0.1197 | -0.2277 | 0.2422 | 0.00 | 0.9523 |
treatment | absorbic | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
response | cold | 1 | -1.5712 | 0.1586 | -1.8934 | -1.2702 | 98.11 | <.0001 |
response | nocold | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
Scale | 0 | 1.0000 | 0.0000 | 1.0000 | 1.0000 |
The scale parameter was held fixed.
So, we can compute the estimated cell counts for having cold given vitamin C based on this model:
\(\mu_{21}=\exp(\lambda+\lambda_2^A+\lambda_1^B)=\exp(4.745+0-1.5712)=23.91 \)
What about the other cells: \(\mu_{11}\), \(\mu_{12}\), and \(\mu_{22}\)?
What are the estimated odds of getting a cold?
\(\exp(\lambda_1^B-\lambda_2^B)=\exp(-1.571-0)=0.208\)
What about the odds ratio?
LR Statistics for Type 3 Analysis
Testing the main effects is similar to the Maximum Likelihood Analysis of Variance in the CATMOD, except this is the likelihood ratio (LR) statistic and not the Wald statistic. When the sample size is moderate or small, the likelihood ratio statistic is the better choice. From the CATMOD output, the chi-square Wald statistic was 98.12, with df=1, and we reject the null hypothesis. Here, we reach the same conclusion, except, that LR statistic is 130.59 with df=1.
LR Statistics For Type 3 Analysis | |||
---|---|---|---|
Source | DF | Chi-Square | Pr > ChiSq |
treatment | 1 | 0.00 | 0.9523 |
response | 1 | 130.59 | <.0001 |
Observation Statistics
Gives information on expected cell counts and five different residuals for further assessment of the model fit (i.e., "Resraw"=observed count - predicted (expected) count, "Reschi"=pearson residual, "Resdev"= deviance residuals, "StReschi"=standardized pearson residuals, etc.)
Observation Statistics | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Observation | count | treatment | response | Predicted Value | Linear Predictor | Standard Error of the Linear Predictor | HessWgt | Lower | Upper | Raw Residual | Pearson Residual | Deviance Residual | Std Deviance Residual | Std Pearson Residual | Likelihood Residual | Leverage | CookD | DFBETA_Intercept | DFBETA_treatmentplacebo | DFBETA_responsecold | DFBETAS_Intercept | DFBETAS_treatmentplacebo | DFBETAS_responsecold |
1 | 31 | placebo | cold | 24.086031 | 3.1816321 | 0.1561792 | 24.086031 | 17.734757 | 32.711861 | 6.9139686 | 1.4087852 | 1.3483626 | 2.0994112 | 2.1934897 | 2.1551805 | 0.5875053 | 2.2842489 | -0.060077 | 0.1197239 | 0.3491947 | -0.674251 | 0.9998856 | 2.2013658 |
2 | 109 | placebo | nocold | 115.91398 | 4.7528483 | 0.0888123 | 115.91398 | 97.395437 | 137.95359 | -6.913978 | -0.642185 | -0.648733 | -2.215859 | -2.193493 | -2.195419 | 0.9142868 | 17.107471 | -0.060077 | -0.576172 | 0.3491952 | -0.674252 | -4.811954 | 2.2013689 |
3 | 17 | absorbic | cold | 23.913989 | 3.1744636 | 0.1563437 | 23.913989 | 17.602407 | 32.488674 | -6.913989 | -1.413848 | -1.491801 | -2.314435 | -2.193496 | -2.244533 | 0.5845377 | 2.2564902 | -0.060077 | 0.1197243 | -0.346701 | -0.674253 | 0.9998885 | -2.185648 |
4 | 122 | absorbic | nocold | 115.08602 | 4.7456799 | 0.0891012 | 115.08602 | 96.645029 | 137.04577 | 6.913978 | 0.6444908 | 0.638194 | 2.172062 | 2.1934927 | 2.1916509 | 0.9136701 | 16.973819 | 0.6358195 | -0.576172 | -0.346701 | 7.1359271 | -4.811954 | -2.185645 |
The LOGLIN and GLM functions in R
To do the same analysis in R, the loglin()
function corresponds roughly to PROC CATMOD, and glm()
function corresponds roughly to PROC GENMOD. See VitaminCLoglin.R for the commands and to generate the output.
Here is a brief comparison to loglin() and glm() output with respect to model estimates.
The following command fits the log-linear model of independence, and from the part of the output that gives parameter estimates, we see that they correspond to the output from CATMOD. For example, the odds of getting a cold are estimated to be
\(\exp(\lambda_{cold}^{response} - \lambda_{no cold}^{response})=\exp(-0.7856-0.7856)=\exp(-1.5712)=0.208\)
loglin(ski, list(1, 2), fit=TRUE, param=TRUE)
$param$"(Intercept)"
[1] 3.963656
$param$Treatment
Placebo VitaminC
0.003584245 -0.003584245
$param$Cold
Cold NoCold
-0.7856083 0.7856083
The following command fits the log-linear model of independence using glm()
, and from the part of the output that gives parameter estimates, we see that they correspond to the output from GENMOD but with the signs reversed, i.e., odds of \(nocold=\exp(\lambda_{nocold}^{response})=\exp(1.5712)\). Thus, the odds of cold are \(\exp(-1.5712)\).
glm(ski.data$Freq~ski.data$Treatment+ski.data$Cold, family=poisson())
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.181632 0.156179 20.372 <2e-16 ***
ski.data$TreatmentVitaminC -0.007168 0.119738 -0.060 0.952
ski.data$ColdNoCold 1.571217 0.158626 9.905 <2e-16 ***
Assessing Goodness of Fit
The ANOVA function gives information on how well the model fits. The value of the deviance statistic is \(G^2=4.872\) with 1 degree of freedom, which is significant evidence to reject this model. Note that this is the same as what we got when we did the chi-square test of independence; we will discuss options for dealing with overdispersion later as well.
> ski.ind
Call: glm(formula = ski.data$Freq ~ ski.data$Treatment + ski.data$Cold, family = poisson())
Coefficients:
(Intercept) ski.data$TreatmentVitaminC ski.data$ColdNoCold
3.181632 -0.007168 1.571217
Degrees of Freedom: 3 Total (i.e. Null); 1 Residual
Null Deviance: 135.5
Residual Deviance: 4.872 AIC: 34
10.1.3 - Saturated Model
10.1.3 - Saturated ModelWith the saturated model, the \(N = IJ\) counts in the cells are still assumed to be independent observations of a Poisson random variable, but no independence is assumed between the variables \(A\) and \(B\). The model is expressed as
\(\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_{ij}^{AB}\)
Note the additional \(\lambda_{ij}^{AB}\) interaction term, compared with the independence model. We still have the same ANOVA-like constraints to avoid over-parameterization, and again, this changes with the software used. Ultimately, the inference, in the end, will not depend on the choice of constraints.
Parameter estimates and interpretation
The odds ratio is directly related to the interaction terms. For example, for a \(2 \times 2\) table,
\begin{align}
\log(\theta) &= \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\
&= \\
&= \lambda_{11}^{AB}+\lambda_{22}^{AB}-\lambda_{12}^{AB}-\lambda_{21}^{AB}
\end{align}
\(\begin{align*}\log(\theta)& = \log\left(\dfrac{\mu_{11}\mu_{22}}{\mu_{12}\mu_{21}}\right)\\ &= \lambda+\lambda_1^A+\lambda_1^B+\lambda_{11}^{AB}+\lambda+\lambda_2^A+\lambda_2^B+\lambda_{22}^{AB}-\lambda-\lambda_1^A-\lambda_2^B-\lambda_{12}^{AB}-\lambda-\lambda_2^A-\lambda_1^B-\lambda_{21}^{AB}\\ &= \lambda_{11}^{AB}+\lambda_{22}^{AB}-\lambda_{12}^{AB}-\lambda_{21}^{AB}\\ \end{align*}\)
How many odds ratios are there? There should be \((I-1) (J-1)\) which should be equal to the unique number of \(\lambda_{ij}\)s in the model.
How many unique parameters are there in this model?
Term | # of terms | #of constraints | # of unique parameters |
---|---|---|---|
\(\lambda\) | \(1\) | \(0\) | \(1\) |
\(\lambda_i^A\) | \(I\) | \(1\) | \(I - 1\) |
\(\lambda_i^B\) | \(J\) | \(1\) | \(J - 1\) |
\(\lambda_{ij}^{AB}\) | \(I J\) | \(I + J - 1\) | \((I - 1) (J - 1)\) |
With \(I J = N\) terms, the model is a perfect fit! Moreover, the saturated model includes the independence model as a special case, where the interaction terms are 0.
Example: Vitamin C
Let's evaluate the CATMOD and GENMOD output for VitaminCLoglin.sas and R code for VitaminCLoglin.R.
To fit the saturated models, we need to specify or add an interaction term in our code.
In CATMOD:
proc catmod data=ski order=data;
weight count;
model treatment*response=_response_;
loglin treatment|response;
run;
The LOGLIN statement (line) now specifies the saturated model. treatment|response specification adds an interaction term to the main effects model.
In GENMOD:
proc genmod data=ski order=data;
class treatment response;
model count = response*treatment /link=log dist=poisson lrci type3 obstats;
run;
The "*" sign in the response*treatment, indicates that we want the interaction term and ALL lower order terms; that is, the saturated model.
Notice, that we can also specify the saturated model by explicitly entering the main effects in the model statement:
model count = response treatment response*treatment/link=log dist=poisson lrci type3 obstats;
In these two model specifications under GENMOD, we are fitting different numbers of parameters, and the output sections on LR Statistics For Type 3 Analysis will be different, but the estimates of odds and odds-ratios will be the same. For the output, we have the same parts as we did for the independence model, except now we have more parameter estimates. We will consider the latter GENMOD parametrization and compare it to that for CATMOD.
So, what are the odds of getting cold given that a person took a placebo pill? Let's first start with log-odds:
\(\log(\mu_{11}/\mu_{12})=\log(\mu_{11})-\log(\mu_{12})=[\lambda+\lambda_1^A+\lambda_1^B+\lambda_{11}^{AB}]-[\lambda+\lambda_1^A+\lambda_2^B+\lambda_{12}^{AB}]\)
Based on GENMOD parameterization: \(\lambda_1^B + \lambda_{11}^{AB} - \lambda_2^B -\lambda_{12}^{AB} = -1.9708 + 0.7134 - 0 - 0\), and the odds are \(\exp(-1.2574) = 0.2844\).
Based on CATMOD parameterization: \([\lambda+\lambda_1^A + \lambda_1^B+\lambda_{11}^{AB} ]- [\lambda+\lambda_1^A- \lambda_1^B - \lambda_{11}^{AB}] = 2\lambda_1^B+2\lambda_{11}^{AB} = 2(-0.807 + 0.1784)\), and the odds are \(\exp(\log \mbox{odds}) = 0.2844\).
In R, the syntax for loglin()
would be
loglin(ski, list(c(1, 2)), fit=TRUE, param=TRUE)
And the syntax for glm()
would be
glm(Freq~Treatment*Cold, family=poisson(), data=ski.data)
loglin()
? What about GENMOD or glm()
?
For example, from the second way of fitting the model with GENMOD, the odds ratio is \(\exp(0.7134)=2.041\), with 95% CI \((\exp(0.079), \exp(1.3770)\). This corresponds to \(1/2.041=0.49\), which we saw before.
treatment*response placebo cold 1 0.7134 0.3293 0.0790 1.3770
From R, we get the same:
> summary(ski.sat)
...
TreatmentVitaminC:ColdNoCold 0.7134 0.3293 2.166 0.0303 *
Q2: Is the interaction of treatment and response significant (hint: look at LR Statistics for Type 3 Analysis). Where does df = 3 come from?
Q3: Model fits perfectly. Let's do some model selection. Compare the independence and saturated model, by calculating the LR statistics (e.g. difference in deviances?
\(2\times (973.0657-970.629)\approx 4.8717\)
Q4: How many df does this statistic have?
Hierarchical Models
These models include all lower-order terms that comprise higher-order terms in the model.
(A, B) is a simpler model than (AB)
Interpretation does not depend on how the variables are coded.
\(\log(\mu_{ij})=\lambda+\lambda_i^A+ \lambda_{ij}^{AB}\)
No, it is not a hierarchical model. It is missing \(\lambda^{B}_{j}\).
We want hierarchical models so that interaction terms represent associations. If there is a significant interaction term, we do NOT look at the lower-order terms, but only interpret the higher-order terms because the value of lower-order terms is coding dependent and can be misleading.
Next, we'll see how these models extend to three-way tables.