11.3 - Inference for Log-linear Models - Dependent Samples
11.3 - Inference for Log-linear Models - Dependent SamplesAfter the previous discussion on matched and dependent data for square tables, we now consider similar questions with the log-linear model. For the movie ratings example, we concluded that the model of independence is not good (as expected) and that there is only a moderate agreement between Siskel and Ebert (estimate \(\kappa = 0.39\).
Siskel |
Ebert | |||
con | mixed | pro | total | |
con | 24 | 8 | 13 | 45 |
mixed | 8 | 13 | 11 | 32 |
pro | 10 | 9 | 64 | 83 |
total | 42 | 30 | 88 | 160 |
How can we use this approach to test for various degrees of agreement between two variables?
Log-linear Models for Square Tables
We can always fit the log-linear models we are already familiar with (e.g., the independence model), but given the nature of matched pairs or repeated measures data, we expect dependence. Thus, there are additional models, specific to these kinds of data that we'd like to consider. Some of those are
- Symmetry
- Quasi-symmetry
- Marginal homogeneity
Fitting of these models typically requires the creation of additional variables, either indicators or other types of numerical variables to be included in the models that we are already familiar with such as an independence model.
11.3.1 - Symmetry Model
11.3.1 - Symmetry ModelA rather restrictive form of agreement between two variables is symmetry, where the expected counts satisfy \(\mu_{ij}=\mu_{ji}\) for all \(i\) and \(j\). As the name suggests, this means the table of expected counts is a symmetric matrix.
Using the notation from the Siskel and Ebert example with general log-linear model
\(\log\mu_{ij}=\lambda+\lambda^S_i+\lambda^E_j+\lambda^{SE}_{ij}\)
the symmetry assumption would mean that
- \(\lambda_{ij}^{SE}=\lambda^{SE}_{ji}\)
- \(\lambda^S_i=\lambda^E_i\)
for all \(i\) and \(j\). Note that this differs from the independence assumption that all \(\lambda_{ij}^{SE}=0\). Both are restrictive models, but neither is a special case of the other.
For an \(I\times I\) table, then there are \(I\) total counts along the main diagonal and \(I(I-1)\) total counts off the main diagonal, giving \(I+(I(I-1)/2\) unique parameters to be estimated to fit the symmetry model.
The challenge in actually fitting this model, however, is that the symmetry restrictions are not simply removing certain terms (setting them equal to 0). Rather, the restriction is that certain counts---the off-diagonal ones---are required to be equal. So, we use a regression approach with some carefully defined indicator variables. Specifically, for each observation in the sample, we take \(I_{ij}=1\) if the observation falls in row \(i\) and column \(j\) OR if the observation falls in row \(j\) and column \(i\), for all \(i\le j\le I\); otherwise \(I_{ij}=0\).
The idea is that the symmetry model views the \(n_{ij}\) count and the \(n_{ji}\) count as coming from a common population, and so only a single parameter is attributed to its mean. For the Siskel and Ebert example, this reduces to
\(\log\mu_{ij}=\beta_1I_{11}+\beta_2I_{22}+\beta_3I_{33}+ \beta_4I_{12}+\beta_5I_{13}+\beta_6I_{23} \)
Thus, the \(i^{th}\) main diagonal mean is \(\mu_{ii}=e^{\beta_i}\), and the \((i,j)^{th}\) off-diagonal mean is \(\mu_{ij}=\mu_{ji}=e^{\beta}\), where \(\beta\) is the coefficient for \(I_{ij}\) (for \(i<j\)).
Let's see how we can do this in SAS and R.
Take a look at the SAS code in MovieCritics.sas. The data used previously is updated with the indicators we'll use for the symmetry model. These replace not only the original variables S and E but also the intercept, giving a total of \(I+I(I-1)/2=6\) parameters in this case (with \(I=3\)).
data critic;
set critic;
/* indicator variables for each diagonal cell*/
I11=0; if siskel='con' AND ebert='con' THEN I11=1;
I22=0; if siskel='mixed' AND ebert='mixed' THEN I22=1;
I33=0; if siskel='pro' AND ebert='pro' THEN I33=1;
/* indicator variables for each off-diagonal cell*/
I12=0; if siskel='con' AND ebert='mixed' THEN I12=1;
if siskel='mixed' AND ebert='con' THEN I12=1;
I13=0; if siskel='con' AND ebert='pro' THEN I13=1;
if siskel='pro' AND ebert='con' THEN I13=1;
I23=0; if siskel='mixed' AND ebert='pro' THEN I23=1;
if siskel='pro' AND ebert='mixed' THEN I23=1;
; run;
/* symmetry */
proc genmod data=critic order=data;
model count=I11 I22 I33 I12 I13 I23 /
link=log dist=poisson noint predicted;
title "Symmetry Model";
run;
From the output, note that the deviance statistic \(G^2=0.5928\) is comparing the symmetry model to the saturated model, which has 3 degrees of freedom corresponding to the difference in the models' numbers of parameters: 9 versus 6. The p-value is fairly large, indicating that the symmetry model is a reasonable fit for this data.
Criteria For Assessing Goodness Of Fit | |||
---|---|---|---|
Criterion | DF | Value | Value/DF |
Deviance | 3 | 0.5928 | 0.1976 |
Scaled Deviance | 3 | 0.5928 | 0.1976 |
Pearson Chi-Square | 3 | 0.5913 | 0.1971 |
Scaled Pearson X2 | 3 | 0.5913 | 0.1971 |
Log Likelihood | 351.2829 | ||
Full Log Likelihood | -20.3921 | ||
AIC (smaller is better) | 52.7842 | ||
AICC (smaller is better) | 94.7842 | ||
BIC (smaller is better) | 53.9676 |
The estimates of the model parameters are given in the table below. We can exponentiate these to get the mean estimates for the \(9\times 9\) table. For example, the estimate for \(\mu_{13}\) is \(e^{2.4423}=11.5\), which also can be calculated intuitively from the table counts directly:
\(\hat{\mu}_{13}=\dfrac{n_{13}+n_{31}}{2}=\dfrac{13+10}{2}\)
Analysis Of Maximum Likelihood Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
Wald 95% Confidence Limits | Wald Chi-Square | Pr > ChiSq | |
Intercept | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
I11 | 1 | 3.1781 | 0.2041 | 2.7780 | 3.5781 | 242.40 | <.0001 |
I22 | 1 | 2.5649 | 0.2774 | 2.0214 | 3.1085 | 85.53 | <.0001 |
I33 | 1 | 4.1589 | 0.1250 | 3.9139 | 4.4039 | 1106.96 | <.0001 |
I12 | 1 | 2.0794 | 0.2500 | 1.5895 | 2.5694 | 69.19 | <.0001 |
I13 | 1 | 2.4423 | 0.2085 | 2.0337 | 2.8510 | 137.20 | <.0001 |
I23 | 1 | 2.3026 | 0.2236 | 1.8643 | 2.7408 | 106.04 | <.0001 |
Scale | 0 | 1.0000 | 0.0000 | 1.0000 | 1.0000 |
The scale parameter was held fixed.
Take a look at the R code in MovieCritics.R. The data is converted to the "stacked" or long format to fit the log-linear model, and the indicators we need for the symmetry model replace the earlier factors S and E. Also, note that the intercept is dropped with the "-1" specification. This is to avoid an overparameterization, given the \(I+I(I-1)/2=6\) indicator variables are already present (with \(I=3\)).
S = factor(rep(c("con","mixed","pro"),each=3)) # siskel
E = factor(rep(c("con","mixed","pro"),times=3)) # ebert
count = c(24,8,13,8,13,11,10,9,64)
# indicator variables for diagonals
I11 = (S=="con")*(E=="con")
I22 = (S=="mixed")*(E=="mixed")
I33 = (S=="pro")*(E=="pro")
# indicator variables for off-diagonals
I12 = (S=="con")*(E=="mixed") + (E=="con")*(S=="mixed")
I13 = (S=="con")*(E=="pro") + (E=="con")*(S=="pro")
I23 = (S=="mixed")*(E=="pro") + (E=="mixed")*(S=="pro")
# symmetry model
fit.s = glm(count~I11+I12+I13+I22+I23+I33-1,family=poisson(link='log'))
At the end of the output, note that the deviance statistic \(G^2=0.59276\) is comparing the symmetry model to the saturated model, which has 3 degrees of freedom corresponding to the difference in the models' numbers of parameters: 9 versus 6. The p-value is fairly large, indicating that the symmetry model is a reasonable fit to this data.
The estimates of the model parameters are given in the "Coefficients" table below. We can exponentiate these to get the mean estimates for the \(9\times 9\) table. For example, the estimate for \(\mu_{13}\) is \(e^{2.4423}=11.5\), which also can be calculated intuitively from the table counts directly:
\(\hat{\mu}_{13}=\dfrac{n_{13}+n_{31}}{2}=\dfrac{13+10}{2}\)
> summary(fit.s)
Call:
glm(formula = count ~ I11 + I12 + I13 + I22 + I23 + I33 - 1,
family = poisson(link = "log"))
Deviance Residuals:
1 2 3 4 5 6 7 8
0.0000 0.0000 0.4332 0.0000 0.0000 0.3112 -0.4525 -0.3217
9
0.0000
Coefficients:
Estimate Std. Error z value Pr(>|z|)
I11 3.1781 0.2041 15.569 <2e-16 ***
I12 2.0794 0.2500 8.318 <2e-16 ***
I13 2.4423 0.2085 11.713 <2e-16 ***
I22 2.5649 0.2774 9.248 <2e-16 ***
I23 2.3026 0.2236 10.297 <2e-16 ***
I33 4.1589 0.1250 33.271 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 721.15855 on 9 degrees of freedom
Residual deviance: 0.59276 on 3 degrees of freedom
AIC: 52.784
Number of Fisher Scoring iterations: 4
> mu.s = fitted.values(fit.s)
> matrix(mu.s,nrow=3)
[,1] [,2] [,3]
[1,] 24.0 8 11.5
[2,] 8.0 13 10.0
[3,] 11.5 10 64.0
Although the symmetry model fits well in this example, it may be too restrictive for other types of data. So, something less restrictive (requiring less agreement) may be more suitable. This leads us to consider "quasi-symmetry".
11.3.2 - Quasi-symmetry Model
11.3.2 - Quasi-symmetry ModelLess restrictive than the symmetry model but still assuming some agreement compared with the fully saturated (unspecified) model, the quasi-symmetry model assumes that
\(\lambda_{ij}^{SE}=\lambda^{SE}_{ji}\)
Note that, compared with the symmetry model, the quasi-symmetry model does not require marginal homogeneity, which is that \(\lambda^S_i=\lambda^E_i\). In terms of the notation we have so far, the model now becomes
\(\log(\mu_{ij}) = \lambda+\lambda_i^S+\lambda_j^E+ \beta_4I_{12}+\beta_5I_{13}+\beta_6I_{23} \)
Thus, we use the separate S and E factors to allow for more general marginal distributions but apply a restriction to their joint distribution.
Generally for an \(I\times I\) table, the number of parameters breaks down as follows:
\(1+(I-1)+(I-1)+\dfrac{I(I-1)}{2}\)
The first three quantities allow one parameter for \(\lambda\) and \(I-1\) parameters for each marginal distribution separately for S and E. The last quantity \(I(I-1)/2\) corresponds to the number of off-diagonal counts and is identical to the symmetry model. But the symmetry model requires only \(I\) parameters for a common marginal distribution, whereas the quasi-symmetry model here allows for additional parameters and is hence less restrictive.
While it may seem at a glance that using the same indicators for the off-diagonal table counts would require symmetry, note that the additional parameters allowed in the marginal distributions for S and E influence the off-diagonal counts as well. For example, the expected count for the \((1,2)\) cell would be
\(\mu_{12}=\exp[\lambda+\lambda_1^S+\lambda_2^E+\beta_4]\)
while the expected count for the \((2,1)\) cell would be
\(\mu_{21}=\exp[\lambda+\lambda_2^S+\lambda_1^E+\beta_4]\)
For the \(3\times3\) table, we're currently working with, the quasi-symmetry model has 8 parameters, and so the deviance test versus the saturated model will have only a single degree of freedom. But for larger tables, it can be a more moderate appreciable reduction. As it is, we see below when fitting this model with software, the test statistic is \(G^2=0.0061\) and is very similar to the saturated model.
In SAS we have...
/* quasi-symmetry */
proc genmod data=critic order=data;
class siskel ebert;
model count=siskel ebert I12 I13 I23 /
link=log dist=poisson predicted;
title "Quasi-Symmetry Model";
run;
And from the output, we can verify the deviance statistic above as well as the non-symmetrical form of the fitted values.
Criteria For Assessing Goodness Of Fit | |||
---|---|---|---|
Criterion | DF | Value | Value/DF |
Deviance | 1 | 0.0061 | 0.0061 |
Scaled Deviance | 1 | 0.0061 | 0.0061 |
Pearson Chi-Square | 1 | 0.0061 | 0.0061 |
Scaled Pearson X2 | 1 | 0.0061 | 0.0061 |
Log Likelihood | 351.5763 | ||
Full Log Likelihood | -20.0988 | ||
AIC (smaller is better) | 56.1975 | ||
AICC (smaller is better) | . | ||
BIC (smaller is better) | 57.7753 |
Observation Statistics | |||||
---|---|---|---|---|---|
Observation | count | Predicted Value | Linear Predictor | Standard Error of the Linear Predictor | HessWgt |
1 | 24 | 24 | 3.1780538 | 0.2041241 | 24 |
2 | 8 | 8.0980871 | 2.0916279 | 0.3150296 | 8.0980871 |
3 | 13 | 12.901913 | 2.5573756 | 0.2606862 | 12.901913 |
4 | 8 | 7.9019129 | 2.0671049 | 0.3179476 | 7.9019129 |
5 | 13 | 13 | 2.5649494 | 0.2773501 | 13 |
6 | 11 | 11.098087 | 2.4067728 | 0.2778455 | 11.098087 |
7 | 10 | 10.098087 | 2.312346 | 0.2888566 | 10.098087 |
8 | 9 | 8.9019129 | 2.1862662 | 0.3037655 | 8.9019129 |
9 | 64 | 64 | 4.1588831 | 0.125 | 64 |
In R we have...
# quasi-symmetry model
fit.q = glm(count~S+E+I12+I13+I23,family=poisson(link='log'))
And from the output, we can verify the deviance statistic above as well as the non-symmetrical form of the fitted values.
> summary(fit.q)
Call:
glm(formula = count ~ S + E + I12 + I13 + I23, family = poisson(link = "log"))
Deviance Residuals:
1 2 3 4 5 6 7 8
0.00000 -0.03454 0.02727 0.03482 0.00000 -0.02949 -0.03092 0.03281
9
0.00000
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.1781 0.2041 15.569 < 2e-16 ***
Smixed -0.3188 0.2594 -1.229 0.21913
Spro 0.3679 0.2146 1.714 0.08652 .
Emixed -0.2943 0.2594 -1.134 0.25666
Epro 0.6129 0.2146 2.856 0.00430 **
I12 -0.7921 0.3036 -2.609 0.00907 **
I13 -1.2336 0.2414 -5.110 3.22e-07 ***
I23 -1.0654 0.2712 -3.928 8.55e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1.0221e+02 on 8 degrees of freedom
Residual deviance: 6.0515e-03 on 1 degrees of freedom
AIC: 56.198
> mu.q = fitted.values(fit.q)
> matrix(mu.q,nrow=3)
[,1] [,2] [,3]
[1,] 24.000000 7.901913 10.098087
[2,] 8.098087 13.000000 8.901913
[3,] 12.901913 11.098087 64.000000
11.3.3 - Marginal Homogeneity
11.3.3 - Marginal HomogeneityWe have seen this when we discussed the two-way table for the siblings' puzzle times. Marginal homogeneity means the row and column distributions (of a square table) are the same. That is,
\(\mu_{i+}=\mu_{+i}\)
for all \(i\). There is no log-linear model that corresponds directly to marginal homogeneity, but we can use the log-linear models of symmetry and quasi-symmetry to indirectly test for marginal homogeneity.
Recall that symmetry has two restrictions:
- \(\lambda_{ij}^{SE}=\lambda^{SE}_{ji}\)
- \(\lambda^S_i=\lambda^E_i\)
And this suggests that symmetry consists of restrictions from both quasi-symmetry and marginal homogeneity. This correspondence is actually one-to-one, meaning that both quasi-symmetry and marginal homogeneity together are necessary and sufficient for symmetry. And thus if a symmetry model lacks fit when compared with a quasi-symmetry model, it must be due to the marginal homogeneity assumption. This gives us an avenue for testing marginal homogeneity:
\(G^2\) (marginal homogeneity versus saturated) \(= G^2\) (symmetry versus quasi-symmetry) \)
For our example, we calculated \(G^2 = 0.5928\) for the test of symmetry versus saturated (with 3 df) and \(G^2=0.0061\) for the test of quasi-symmetry saturated (with 1 df), giving a likelihood ratio test of \(G^2=0.5928-0.0061=0.5867\) for the test of symmetry versus quasi-symmetry (with 2 df). Indirectly, this serves as our test for marginal homogeneity, and in this case, it would be a reasonable fit to the data (p-value 0.746).
A diagram of the models discussed in the lesson are depicted below.
The most general (complex) model is the saturated model. Each of quasi-symmetry and marginal homogeneity is a special case of the saturated model, but neither is a special case of the other. And symmetry is both quasi-symmetry and marginal homogeneity.
In terms of deviance statistics versus saturated, the table below summarizes the test results. For this example, the symmetry model would be preferred because it's the most restrictive model that fits the data reasonably well.
Model | df | \(G^2\) | p-value |
---|---|---|---|
Independence | 4 | 43.2325 | 0.0001 |
Symmetry | 3 | 0.5928 | 0.900 |
Marginal homogeneity | 2 | 0.587 | 0.746 |
Quasi-symmetry | 1 | 0.0061 | 0.938 |
Thus, Siskel and Ebert tend to agree on their ratings of movies.