11.3 - Inference for Log-linear Models - Dependent Samples

11.3 - Inference for Log-linear Models - Dependent Samples

After the previous discussion on matched and dependent data for square tables, we now consider similar questions with the log-linear model. For the movie ratings example, we concluded that the model of independence is not good (as expected) and that there is only a moderate agreement between Siskel and Ebert (estimate \(\kappa = 0.39\).

 

Siskel

Ebert
con mixed pro total
con 24 8 13 45
mixed 8 13 11 32
pro 10 9 64 83
total 42 30 88 160

 How can we use this approach to test for various degrees of agreement between two variables?

Log-linear Models for Square Tables

We can always fit the log-linear models we are already familiar with (e.g., the independence model), but given the nature of matched pairs or repeated measures data, we expect dependence. Thus, there are additional models, specific to these kinds of data that we'd like to consider. Some of those are

  • Symmetry
  • Quasi-symmetry
  • Marginal homogeneity

Fitting of these models typically requires the creation of additional variables, either indicators or other types of numerical variables to be included in the models that we are already familiar with such as an independence model.


11.3.1 - Symmetry Model

11.3.1 - Symmetry Model

A rather restrictive form of agreement between two variables is symmetry, where the expected counts satisfy \(\mu_{ij}=\mu_{ji}\) for all \(i\) and \(j\). As the name suggests, this means the table of expected counts is a symmetric matrix.

Using the notation from the Siskel and Ebert example with general log-linear model

\(\log\mu_{ij}=\lambda+\lambda^S_i+\lambda^E_j+\lambda^{SE}_{ij}\)

the symmetry assumption would mean that

  • \(\lambda_{ij}^{SE}=\lambda^{SE}_{ji}\)
  • \(\lambda^S_i=\lambda^E_i\)

for all \(i\) and \(j\). Note that this differs from the independence assumption that all \(\lambda_{ij}^{SE}=0\). Both are restrictive models, but neither is a special case of the other.

For an \(I\times I\) table, then there are \(I\) total counts along the main diagonal and \(I(I-1)\) total counts off the main diagonal, giving \(I+(I(I-1)/2\) unique parameters to be estimated to fit the symmetry model.

The challenge in actually fitting this model, however, is that the symmetry restrictions are not simply removing certain terms (setting them equal to 0). Rather, the restriction is that certain counts---the off-diagonal ones---are required to be equal. So, we use a regression approach with some carefully defined indicator variables. Specifically, for each observation in the sample, we take \(I_{ij}=1\) if the observation falls in row \(i\) and column \(j\) OR if the observation falls in row \(j\) and column \(i\), for all \(i\le j\le I\); otherwise \(I_{ij}=0\).

The idea is that the symmetry model views the \(n_{ij}\) count and the \(n_{ji}\) count as coming from a common population, and so only a single parameter is attributed to its mean. For the Siskel and Ebert example, this reduces to

\(\log\mu_{ij}=\beta_1I_{11}+\beta_2I_{22}+\beta_3I_{33}+ \beta_4I_{12}+\beta_5I_{13}+\beta_6I_{23} \)

Thus, the \(i^{th}\) main diagonal mean is \(\mu_{ii}=e^{\beta_i}\), and the \((i,j)^{th}\) off-diagonal mean is \(\mu_{ij}=\mu_{ji}=e^{\beta}\), where \(\beta\) is the coefficient for \(I_{ij}\) (for \(i<j\)).

Let's see how we can do this in SAS and R.

Take a look at the SAS code in MovieCritics.sas. The data used previously is updated with the indicators we'll use for the symmetry model. These replace not only the original variables S and E but also the intercept, giving a total of \(I+I(I-1)/2=6\) parameters in this case (with \(I=3\)).

data critic;
set critic;

/* indicator variables for each diagonal cell*/
I11=0; if siskel='con' AND ebert='con' THEN I11=1;
I22=0; if siskel='mixed' AND ebert='mixed' THEN I22=1;
I33=0; if siskel='pro' AND ebert='pro' THEN I33=1;

/* indicator variables for each off-diagonal cell*/
I12=0; if siskel='con' AND ebert='mixed' THEN I12=1;
 if siskel='mixed' AND ebert='con' THEN I12=1;
I13=0; if siskel='con' AND ebert='pro' THEN I13=1;
 if siskel='pro' AND ebert='con' THEN I13=1;
I23=0; if siskel='mixed' AND ebert='pro' THEN I23=1;
 if siskel='pro' AND ebert='mixed' THEN I23=1;
; run;

/* symmetry */
proc genmod data=critic order=data;
model count=I11 I22 I33 I12 I13 I23 / 
 link=log dist=poisson noint predicted;
title "Symmetry Model";
run;

From the output, note that the deviance statistic \(G^2=0.5928\) is comparing the symmetry model to the saturated model, which has 3 degrees of freedom corresponding to the difference in the models' numbers of parameters: 9 versus 6. The p-value is fairly large, indicating that the symmetry model is a reasonable fit for this data.

 
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 3 0.5928 0.1976
Scaled Deviance 3 0.5928 0.1976
Pearson Chi-Square 3 0.5913 0.1971
Scaled Pearson X2 3 0.5913 0.1971
Log Likelihood   351.2829  
Full Log Likelihood   -20.3921  
AIC (smaller is better)   52.7842  
AICC (smaller is better)   94.7842  
BIC (smaller is better)   53.9676  

The estimates of the model parameters are given in the table below. We can exponentiate these to get the mean estimates for the \(9\times 9\) table. For example, the estimate for \(\mu_{13}\) is \(e^{2.4423}=11.5\), which also can be calculated intuitively from the table counts directly:

\(\hat{\mu}_{13}=\dfrac{n_{13}+n_{31}}{2}=\dfrac{13+10}{2}\)

 
Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 0 0.0000 0.0000 0.0000 0.0000 . .
I11 1 3.1781 0.2041 2.7780 3.5781 242.40 <.0001
I22 1 2.5649 0.2774 2.0214 3.1085 85.53 <.0001
I33 1 4.1589 0.1250 3.9139 4.4039 1106.96 <.0001
I12 1 2.0794 0.2500 1.5895 2.5694 69.19 <.0001
I13 1 2.4423 0.2085 2.0337 2.8510 137.20 <.0001
I23 1 2.3026 0.2236 1.8643 2.7408 106.04 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

Take a look at the R code in MovieCritics.R. The data is converted to the "stacked" or long format to fit the log-linear model, and the indicators we need for the symmetry model replace the earlier factors S and E. Also, note that the intercept is dropped with the "-1" specification. This is to avoid an overparameterization, given the \(I+I(I-1)/2=6\) indicator variables are already present (with \(I=3\)).

S = factor(rep(c("con","mixed","pro"),each=3)) # siskel
E = factor(rep(c("con","mixed","pro"),times=3)) # ebert
count = c(24,8,13,8,13,11,10,9,64)

# indicator variables for diagonals
I11 = (S=="con")*(E=="con")
I22 = (S=="mixed")*(E=="mixed")
I33 = (S=="pro")*(E=="pro")

# indicator variables for off-diagonals
I12 = (S=="con")*(E=="mixed") + (E=="con")*(S=="mixed")
I13 = (S=="con")*(E=="pro") + (E=="con")*(S=="pro")
I23 = (S=="mixed")*(E=="pro") + (E=="mixed")*(S=="pro")

# symmetry model
fit.s = glm(count~I11+I12+I13+I22+I23+I33-1,family=poisson(link='log'))

At the end of the output, note that the deviance statistic \(G^2=0.59276\) is comparing the symmetry model to the saturated model, which has 3 degrees of freedom corresponding to the difference in the models' numbers of parameters: 9 versus 6. The p-value is fairly large, indicating that the symmetry model is a reasonable fit to this data.

The estimates of the model parameters are given in the "Coefficients" table below. We can exponentiate these to get the mean estimates for the \(9\times 9\) table. For example, the estimate for \(\mu_{13}\) is \(e^{2.4423}=11.5\), which also can be calculated intuitively from the table counts directly:

\(\hat{\mu}_{13}=\dfrac{n_{13}+n_{31}}{2}=\dfrac{13+10}{2}\)

> summary(fit.s)

Call:
glm(formula = count ~ I11 + I12 + I13 + I22 + I23 + I33 - 1, 
    family = poisson(link = "log"))

Deviance Residuals: 
      1        2        3        4        5        6        7        8  
 0.0000   0.0000   0.4332   0.0000   0.0000   0.3112  -0.4525  -0.3217  
      9  
 0.0000  

Coefficients:
    Estimate Std. Error z value Pr(>|z|)    
I11   3.1781     0.2041  15.569   <2e-16 ***
I12   2.0794     0.2500   8.318   <2e-16 ***
I13   2.4423     0.2085  11.713   <2e-16 ***
I22   2.5649     0.2774   9.248   <2e-16 ***
I23   2.3026     0.2236  10.297   <2e-16 ***
I33   4.1589     0.1250  33.271   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 721.15855  on 9  degrees of freedom
Residual deviance:   0.59276  on 3  degrees of freedom
AIC: 52.784

Number of Fisher Scoring iterations: 4

> mu.s = fitted.values(fit.s)
> matrix(mu.s,nrow=3)
     [,1] [,2] [,3]
[1,] 24.0    8 11.5
[2,]  8.0   13 10.0
[3,] 11.5   10 64.0

Although the symmetry model fits well in this example, it may be too restrictive for other types of data. So, something less restrictive (requiring less agreement) may be more suitable. This leads us to consider "quasi-symmetry".


11.3.2 - Quasi-symmetry Model

11.3.2 - Quasi-symmetry Model

Less restrictive than the symmetry model but still assuming some agreement compared with the fully saturated (unspecified) model, the quasi-symmetry model assumes that

\(\lambda_{ij}^{SE}=\lambda^{SE}_{ji}\)

Note that, compared with the symmetry model, the quasi-symmetry model does not require marginal homogeneity, which is that \(\lambda^S_i=\lambda^E_i\). In terms of the notation we have so far, the model now becomes

\(\log(\mu_{ij}) = \lambda+\lambda_i^S+\lambda_j^E+ \beta_4I_{12}+\beta_5I_{13}+\beta_6I_{23} \)

Thus, we use the separate S and E factors to allow for more general marginal distributions but apply a restriction to their joint distribution.

Generally for an \(I\times I\) table, the number of parameters breaks down as follows:

\(1+(I-1)+(I-1)+\dfrac{I(I-1)}{2}\)

The first three quantities allow one parameter for \(\lambda\) and \(I-1\) parameters for each marginal distribution separately for S and E. The last quantity \(I(I-1)/2\) corresponds to the number of off-diagonal counts and is identical to the symmetry model. But the symmetry model requires only \(I\) parameters for a common marginal distribution, whereas the quasi-symmetry model here allows for additional parameters and is hence less restrictive.

While it may seem at a glance that using the same indicators for the off-diagonal table counts would require symmetry, note that the additional parameters allowed in the marginal distributions for S and E influence the off-diagonal counts as well. For example, the expected count for the \((1,2)\) cell would be

\(\mu_{12}=\exp[\lambda+\lambda_1^S+\lambda_2^E+\beta_4]\)

while the expected count for the \((2,1)\) cell would be

\(\mu_{21}=\exp[\lambda+\lambda_2^S+\lambda_1^E+\beta_4]\)

For the \(3\times3\) table, we're currently working with, the quasi-symmetry model has 8 parameters, and so the deviance test versus the saturated model will have only a single degree of freedom. But for larger tables, it can be a more moderate appreciable reduction. As it is, we see below when fitting this model with software, the test statistic is \(G^2=0.0061\) and is very similar to the saturated model.

In SAS we have...

/* quasi-symmetry */
proc genmod data=critic order=data;
class siskel ebert;
model count=siskel ebert I12 I13 I23 /
 link=log dist=poisson predicted;
title "Quasi-Symmetry Model";
run;

And from the output, we can verify the deviance statistic above as well as the non-symmetrical form of the fitted values.

 
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 1 0.0061 0.0061
Scaled Deviance 1 0.0061 0.0061
Pearson Chi-Square 1 0.0061 0.0061
Scaled Pearson X2 1 0.0061 0.0061
Log Likelihood   351.5763  
Full Log Likelihood   -20.0988  
AIC (smaller is better)   56.1975  
AICC (smaller is better)   .  
BIC (smaller is better)   57.7753  
 
Observation Statistics
Observation count Predicted Value Linear Predictor Standard Error of the Linear Predictor HessWgt
1 24 24 3.1780538 0.2041241 24
2 8 8.0980871 2.0916279 0.3150296 8.0980871
3 13 12.901913 2.5573756 0.2606862 12.901913
4 8 7.9019129 2.0671049 0.3179476 7.9019129
5 13 13 2.5649494 0.2773501 13
6 11 11.098087 2.4067728 0.2778455 11.098087
7 10 10.098087 2.312346 0.2888566 10.098087
8 9 8.9019129 2.1862662 0.3037655 8.9019129
9 64 64 4.1588831 0.125 64

In R we have...

# quasi-symmetry model
fit.q = glm(count~S+E+I12+I13+I23,family=poisson(link='log'))

And from the output, we can verify the deviance statistic above as well as the non-symmetrical form of the fitted values.

> summary(fit.q)

Call:
glm(formula = count ~ S + E + I12 + I13 + I23, family = poisson(link = "log"))

Deviance Residuals: 
       1         2         3         4         5         6         7         8  
 0.00000  -0.03454   0.02727   0.03482   0.00000  -0.02949  -0.03092   0.03281  
       9  
 0.00000  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   3.1781     0.2041  15.569  < 2e-16 ***
Smixed       -0.3188     0.2594  -1.229  0.21913    
Spro          0.3679     0.2146   1.714  0.08652 .  
Emixed       -0.2943     0.2594  -1.134  0.25666    
Epro          0.6129     0.2146   2.856  0.00430 ** 
I12          -0.7921     0.3036  -2.609  0.00907 ** 
I13          -1.2336     0.2414  -5.110 3.22e-07 ***
I23          -1.0654     0.2712  -3.928 8.55e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1.0221e+02  on 8  degrees of freedom
Residual deviance: 6.0515e-03  on 1  degrees of freedom
AIC: 56.198

> mu.q = fitted.values(fit.q)
> matrix(mu.q,nrow=3)
          [,1]      [,2]      [,3]
[1,] 24.000000  7.901913 10.098087
[2,]  8.098087 13.000000  8.901913
[3,] 12.901913 11.098087 64.000000

11.3.3 - Marginal Homogeneity

11.3.3 - Marginal Homogeneity

We have seen this when we discussed the two-way table for the siblings' puzzle times. Marginal homogeneity means the row and column distributions (of a square table) are the same. That is, 

\(\mu_{i+}=\mu_{+i}\)

for all \(i\). There is no log-linear model that corresponds directly to marginal homogeneity, but we can use the log-linear models of symmetry and quasi-symmetry to indirectly test for marginal homogeneity.

Recall that symmetry has two restrictions:

  • \(\lambda_{ij}^{SE}=\lambda^{SE}_{ji}\)
  • \(\lambda^S_i=\lambda^E_i\)

And this suggests that symmetry consists of restrictions from both quasi-symmetry and marginal homogeneity. This correspondence is actually one-to-one, meaning that both quasi-symmetry and marginal homogeneity together are necessary and sufficient for symmetry. And thus if a symmetry model lacks fit when compared with a quasi-symmetry model, it must be due to the marginal homogeneity assumption. This gives us an avenue for testing marginal homogeneity:

\(G^2\) (marginal homogeneity versus saturated) \(= G^2\) (symmetry versus quasi-symmetry) \)

For our example, we calculated \(G^2 = 0.5928\) for the test of symmetry versus saturated (with 3 df) and \(G^2=0.0061\) for the test of quasi-symmetry saturated (with 1 df), giving a likelihood ratio test of \(G^2=0.5928-0.0061=0.5867\) for the test of symmetry versus quasi-symmetry (with 2 df). Indirectly, this serves as our test for marginal homogeneity, and in this case, it would be a reasonable fit to the data (p-value 0.746).

A diagram of the models discussed in the lesson are depicted below.

SATURATED MODEL Symmetry Marginal homogeneity Quasi-symmetry

The most general (complex) model is the saturated model. Each of quasi-symmetry and marginal homogeneity is a special case of the saturated model, but neither is a special case of the other. And symmetry is both quasi-symmetry and marginal homogeneity.

In terms of deviance statistics versus saturated, the table below summarizes the test results. For this example, the symmetry model would be preferred because it's the most restrictive model that fits the data reasonably well.

Model df \(G^2\) p-value
Independence 4 43.2325 0.0001
Symmetry 3 0.5928 0.900
Marginal homogeneity 2 0.587 0.746
Quasi-symmetry 1 0.0061 0.938

Thus, Siskel and Ebert tend to agree on their ratings of movies.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility