A rather restrictive form of agreement between two variables is symmetry, where the expected counts satisfy \(\mu_{ij}=\mu_{ji}\) for all \(i\) and \(j\). As the name suggests, this means the table of expected counts is a symmetric matrix.
Using the notation from the Siskel and Ebert example with general log-linear model
\(\log\mu_{ij}=\lambda+\lambda^S_i+\lambda^E_j+\lambda^{SE}_{ij}\)
the symmetry assumption would mean that
- \(\lambda_{ij}^{SE}=\lambda^{SE}_{ji}\)
- \(\lambda^S_i=\lambda^E_i\)
for all \(i\) and \(j\). Note that this differs from the independence assumption that all \(\lambda_{ij}^{SE}=0\). Both are restrictive models, but neither is a special case of the other.
For an \(I\times I\) table, then there are \(I\) total counts along the main diagonal and \(I(I-1)\) total counts off the main diagonal, giving \(I+(I(I-1)/2\) unique parameters to be estimated to fit the symmetry model.
The challenge in actually fitting this model, however, is that the symmetry restrictions are not simply removing certain terms (setting them equal to 0). Rather, the restriction is that certain counts---the off-diagonal ones---are required to be equal. So, we use a regression approach with some carefully defined indicator variables. Specifically, for each observation in the sample, we take \(I_{ij}=1\) if the observation falls in row \(i\) and column \(j\) OR if the observation falls in row \(j\) and column \(i\), for all \(i\le j\le I\); otherwise \(I_{ij}=0\).
The idea is that the symmetry model views the \(n_{ij}\) count and the \(n_{ji}\) count as coming from a common population, and so only a single parameter is attributed to its mean. For the Siskel and Ebert example, this reduces to
\(\log\mu_{ij}=\beta_1I_{11}+\beta_2I_{22}+\beta_3I_{33}+ \beta_4I_{12}+\beta_5I_{13}+\beta_6I_{23} \)
Thus, the \(i^{th}\) main diagonal mean is \(\mu_{ii}=e^{\beta_i}\), and the \((i,j)^{th}\) off-diagonal mean is \(\mu_{ij}=\mu_{ji}=e^{\beta}\), where \(\beta\) is the coefficient for \(I_{ij}\) (for \(i<j\)).
Let's see how we can do this in SAS and R.
Take a look at the SAS code in MovieCritics.sas. The data used previously is updated with the indicators we'll use for the symmetry model. These replace not only the original variables S and E but also the intercept, giving a total of \(I+I(I-1)/2=6\) parameters in this case (with \(I=3\)).
data critic;
set critic;
/* indicator variables for each diagonal cell*/
I11=0; if siskel='con' AND ebert='con' THEN I11=1;
I22=0; if siskel='mixed' AND ebert='mixed' THEN I22=1;
I33=0; if siskel='pro' AND ebert='pro' THEN I33=1;
/* indicator variables for each off-diagonal cell*/
I12=0; if siskel='con' AND ebert='mixed' THEN I12=1;
if siskel='mixed' AND ebert='con' THEN I12=1;
I13=0; if siskel='con' AND ebert='pro' THEN I13=1;
if siskel='pro' AND ebert='con' THEN I13=1;
I23=0; if siskel='mixed' AND ebert='pro' THEN I23=1;
if siskel='pro' AND ebert='mixed' THEN I23=1;
; run;
/* symmetry */
proc genmod data=critic order=data;
model count=I11 I22 I33 I12 I13 I23 /
link=log dist=poisson noint predicted;
title "Symmetry Model";
run;
From the output, note that the deviance statistic \(G^2=0.5928\) is comparing the symmetry model to the saturated model, which has 3 degrees of freedom corresponding to the difference in the models' numbers of parameters: 9 versus 6. The p-value is fairly large, indicating that the symmetry model is a reasonable fit for this data.
Criteria For Assessing Goodness Of Fit | |||
---|---|---|---|
Criterion | DF | Value | Value/DF |
Deviance | 3 | 0.5928 | 0.1976 |
Scaled Deviance | 3 | 0.5928 | 0.1976 |
Pearson Chi-Square | 3 | 0.5913 | 0.1971 |
Scaled Pearson X2 | 3 | 0.5913 | 0.1971 |
Log Likelihood | 351.2829 | ||
Full Log Likelihood | -20.3921 | ||
AIC (smaller is better) | 52.7842 | ||
AICC (smaller is better) | 94.7842 | ||
BIC (smaller is better) | 53.9676 |
The estimates of the model parameters are given in the table below. We can exponentiate these to get the mean estimates for the \(9\times 9\) table. For example, the estimate for \(\mu_{13}\) is \(e^{2.4423}=11.5\), which also can be calculated intuitively from the table counts directly:
\(\hat{\mu}_{13}=\dfrac{n_{13}+n_{31}}{2}=\dfrac{13+10}{2}\)
Analysis Of Maximum Likelihood Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
Wald 95% Confidence Limits | Wald Chi-Square | Pr > ChiSq | |
Intercept | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
I11 | 1 | 3.1781 | 0.2041 | 2.7780 | 3.5781 | 242.40 | <.0001 |
I22 | 1 | 2.5649 | 0.2774 | 2.0214 | 3.1085 | 85.53 | <.0001 |
I33 | 1 | 4.1589 | 0.1250 | 3.9139 | 4.4039 | 1106.96 | <.0001 |
I12 | 1 | 2.0794 | 0.2500 | 1.5895 | 2.5694 | 69.19 | <.0001 |
I13 | 1 | 2.4423 | 0.2085 | 2.0337 | 2.8510 | 137.20 | <.0001 |
I23 | 1 | 2.3026 | 0.2236 | 1.8643 | 2.7408 | 106.04 | <.0001 |
Scale | 0 | 1.0000 | 0.0000 | 1.0000 | 1.0000 |
The scale parameter was held fixed.
Take a look at the R code in MovieCritics.R. The data is converted to the "stacked" or long format to fit the log-linear model, and the indicators we need for the symmetry model replace the earlier factors S and E. Also, note that the intercept is dropped with the "-1" specification. This is to avoid an overparameterization, given the \(I+I(I-1)/2=6\) indicator variables are already present (with \(I=3\)).
S = factor(rep(c("con","mixed","pro"),each=3)) # siskel
E = factor(rep(c("con","mixed","pro"),times=3)) # ebert
count = c(24,8,13,8,13,11,10,9,64)
# indicator variables for diagonals
I11 = (S=="con")*(E=="con")
I22 = (S=="mixed")*(E=="mixed")
I33 = (S=="pro")*(E=="pro")
# indicator variables for off-diagonals
I12 = (S=="con")*(E=="mixed") + (E=="con")*(S=="mixed")
I13 = (S=="con")*(E=="pro") + (E=="con")*(S=="pro")
I23 = (S=="mixed")*(E=="pro") + (E=="mixed")*(S=="pro")
# symmetry model
fit.s = glm(count~I11+I12+I13+I22+I23+I33-1,family=poisson(link='log'))
At the end of the output, note that the deviance statistic \(G^2=0.59276\) is comparing the symmetry model to the saturated model, which has 3 degrees of freedom corresponding to the difference in the models' numbers of parameters: 9 versus 6. The p-value is fairly large, indicating that the symmetry model is a reasonable fit to this data.
The estimates of the model parameters are given in the "Coefficients" table below. We can exponentiate these to get the mean estimates for the \(9\times 9\) table. For example, the estimate for \(\mu_{13}\) is \(e^{2.4423}=11.5\), which also can be calculated intuitively from the table counts directly:
\(\hat{\mu}_{13}=\dfrac{n_{13}+n_{31}}{2}=\dfrac{13+10}{2}\)
> summary(fit.s)
Call:
glm(formula = count ~ I11 + I12 + I13 + I22 + I23 + I33 - 1,
family = poisson(link = "log"))
Deviance Residuals:
1 2 3 4 5 6 7 8
0.0000 0.0000 0.4332 0.0000 0.0000 0.3112 -0.4525 -0.3217
9
0.0000
Coefficients:
Estimate Std. Error z value Pr(>|z|)
I11 3.1781 0.2041 15.569 <2e-16 ***
I12 2.0794 0.2500 8.318 <2e-16 ***
I13 2.4423 0.2085 11.713 <2e-16 ***
I22 2.5649 0.2774 9.248 <2e-16 ***
I23 2.3026 0.2236 10.297 <2e-16 ***
I33 4.1589 0.1250 33.271 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 721.15855 on 9 degrees of freedom
Residual deviance: 0.59276 on 3 degrees of freedom
AIC: 52.784
Number of Fisher Scoring iterations: 4
> mu.s = fitted.values(fit.s)
> matrix(mu.s,nrow=3)
[,1] [,2] [,3]
[1,] 24.0 8 11.5
[2,] 8.0 13 10.0
[3,] 11.5 10 64.0
Although the symmetry model fits well in this example, it may be too restrictive for other types of data. So, something less restrictive (requiring less agreement) may be more suitable. This leads us to consider "quasi-symmetry".