In the previous lesson we have covered the basic applications of loglinear models: independence and various types of association.
In this lesson we continue with further applications.
Key Concepts
Objectives

The CATMOD procedure for dealing with zeros
https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect44.htm [1]
https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect33.htm [2]
As we consider higherdimensional contingency tables, we are more likely to encounter sparseness. A sparse table is one where there are many cells with small counts and/or zeros. How many is 'many' and how small is 'small' are relative to
We can have a small sample size, or a large sample size with a large number of cells. The large number of cells can be either due to the large number of classification variables, or small number of variables but with lots of levels.
Mostly when we talk about sparseness we are talking about zeros. This is a problem that is more frequently encountered. There are different types of zeros, and we need to differentiate between the two. We have discussed different types of zero's earlier in a passing manner. Now we will examine the situations in more detail.
Sampling Zeros occur when there is no observation in the cell; i.e., n_{ij} = 0, but probabilistically you still have a chance of observing this value, P(observation in a cell) = π_{ij }> 0
For example, if you increase the sample size n, you might get n_{ij} > 0. Consider sampling graduate students under 18 years old. They exist, but we didn't sample them. We can think of these values as random zeros ; they could be missing at random, or there maybe more structural missing mechanism due to study design.
Structural Zeros are cells that are theoretically impossible to observe a value, i.e. n_{ij} = 0 and π_{ij} = 0.
Tables with structural zeros are structurally incomplete. They are also know as incomplete tables.
This is different from a partial classification where an incomplete table results from not being able to completely crossclassify all individuals or units.
Survey of teenagers regarding their health concerns (Fienberg (1980)):
Health Concern

Gender


Male

Female


Sex/Reproduction 
6

16

Menstrual problems 


12

How healthy am I? 
49

29

None 
77

102

The reasonable assumption here is that men will not have concerns with menstrual problems, therefore the chance of observing this type of concern is: P(male with menstrual problems) = 0.
Data from a study in which elderly were asked whether they took tranquillizers (Agresti, 1990). Some individuals were interviewed in 1979, some in 1985, and some in both 1979 and 1985.
1985


1975 
yes

no

not sampled

total

yes 
175

190

230

595

no 
139

1518

982

2639

not sampled 
64

596



659

total 
378

2303

1212

3893

In this case, there were some possible subjects that simply were not sampled.
Recognizing and understanding different types of incompleteness has implications for statistical analysis and inference.
For example, if there are structural zeros or incomplete classification you should NOT:
We will focus more on fitting models with sampling zeros. Some good references on modeling incomplete tables are Fienberg (1980), Chapter 8 and Bishop, Holland and Fienberg (1975), Chapters, 5, 6 and 8.
There are two major effects on modeling contingency tables with sampling zeros:
Recall that terms in a loglinear model correspond to the margins of the tables. Existence of MLE estimates depends on the position of zeros in the table and what effects are included in a model.
If all n_{ij} > 0 than MLE estimates of parameters are finite.
If a table has a 0 marginal frequency, and there is a term in the model corresponding to that margin, the MLE estimates are infinite. The most recent research which may shed more light on the relationship between the pattern of zeros and MLE existence involves tools from Algebraic Statistics (Rinaldo, 2005).
Reference: Fienberg, 1980, Ch.8, ZerosEx.sas [4], and ZerosEx.R [5]. There are two examples in these files; first with a nonzero margin, but a bad pattern of zeros, and the second one with a zero margin.
If there is a single sampling zero in a table, the MLE estimates are always positive.
If there are nonzero margins, but a bad pattern of zeros, some MLE estimates of the cells counts will be negative (although you might not observe this in a computer program)!
If there are zero margins, you cannot estimate the corresponding term in the model, and the partial odds ratios involving this cell will equal 0 or 1 For example, n_{+11} = 0 so there is no MLE estimate for λ_{11}^{YZ} = 0, and any model including this term will not fit properly.
We can force λ_{11}^{YZ} = 0 but then must adjust for the degrees of freedom in the model, and no software will do this automatically (look at the last part of ZerosEx.sas [4] and ZerosEx.lst [6] (or ZerosEx.R) and the homogeneous model)!
A general formula for adjusting DF
df = (T_{e}  Z_{e})  (T_{p}  Z_{p})
where T_{e} = of cells fitted, Z_{e} = of zero cells, T_{p} = of parameters in the model, Z_{p} = of parameters that cannot be estimated because of zero margins.
Let's see these examples. First consider the following 2 × 2 × 2 table for each level of Z variable:
Z=1:
Y=1  Y=2  
X=1  0  5 
X=2  0  16 
Z=2
Y=1 
Y=2 

X=1 
6 
9 
X=2 
5 
7 
Now consider that we want to fit a homogeneous association model: (XY, XZ,YZ). Let's consider the marginal tables that correspond to the highest order terms in the model and see if any of them have 0 counts:
XY:
Y=1 
Y=2 

X=1 
6 
14 
X=2 
5 
23 
XZ:
Z=1 
Z=2 

X=1 
5 
15 
X=2 
16 
12 
YZ:
Z=1 
Z=2 

Y=1 
0 
11 
Y=2 
21 
16 
Here, n_{+11} = 0 so there is no MLE estimate for λ_{11}^{YZ} = 0, and any model including this term will not fit properly. Next, notice that a software does not usually detect that there is a problem  you need to pay attention to this! See below for hints want to look for. Sometimes you will just get "NaN" instead of parameter estimates  that is, a software won't be able to estimate it (e.g. loglin() in R). For now, let's look at R and SAS outputs using glm() that is GENMOD:
From   notice huge standard errors for some of the terms such as Y*Z, y1z1 below!
Analysis Of Parameter Estimates
Standard Wald 95% Chi
Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq
Intercept 1 1.9459 0.3780 1.2051 2.6867 26.51 <.0001
X x1 1 0.2513 0.5040 0.7364 1.2390 0.25 0.6180
X x2 0 0.0000 0.0000 0.0000 0.0000 . .
Y y1 1 0.3365 0.5855 1.4841 0.8112 0.33 0.5655
Y y2 0 0.0000 0.0000 0.0000 0.0000 . .
Z z1 1 0.8267 0.4532 0.0615 1.7149 3.33 0.0681
Z z2 0 0.0000 0.0000 0.0000 0.0000 . .
X*Y x1 y1 1 0.0690 0.7878 1.6131 1.4751 0.01 0.9302
X*Y x1 y2 0 0.0000 0.0000 0.0000 0.0000 . .
X*Y x2 y1 0 0.0000 0.0000 0.0000 0.0000 . .
X*Y x2 y2 0 0.0000 0.0000 0.0000 0.0000 . .
X*Z x1 z1 1 1.4145 0.7187 2.8230 0.0059 3.87 0.0490
X*Z x1 z2 0 0.0000 0.0000 0.0000 0.0000 . .
X*Z x2 z1 0 0.0000 0.0000 0.0000 0.0000 . .
X*Z x2 z2 0 0.0000 0.0000 0.0000 0.0000 . .
Y*Z y1 z1 1 26.0000 115148.2 225712 225660.3 0.00 0.9998
Y*Z y1 z2 0 0.0000 0.0000 0.0000 0.0000 . .
Y*Z y2 z1 0 0.0000 0.0000 0.0000 0.0000 . .
Y*Z y2 z2 0 0.0000 0.0000 0.0000 0.0000 . .
Homogeneous Association with zero margin: (XY,XZ,YZ)
As expected overall df=1, but they really ought to be 0!
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 1 0.0000 0.0000
Here, if we apply the above formula, the dfs should be:
df = (T_{e}  Z_{e})  (T_{p}  Z_{p})=(82)(71)=66=0
However, the estimates are huge and the model like this is useless. The best is to remove the cells with zero counts and refit the model, and this should compute the correct degrees of freedom too. In SAS, one way of doing this is:
/*delete zero values*/
data zeros1;
set zeros;
if count=0 then delete;
run;
/* homogeneous associations */
proc genmod order=data ;
class X Y Z;
model count = X Y Z X*Y X*Z Z*Y /link=log dist=poi obstats;
title 'Homogeneous Association with zero margin': (XY,XZ,YZ);
run;
Now the output looks like:
Analysis Of Parameter Estimates
Standard Wald 95% Chi
Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq
Intercept 1 2.7726 0.2500 2.2826 3.2626 123.00 <.0001
X x1 1 1.1632 0.5123 2.1673 0.1590 5.15 0.0232
X x2 0 0.0000 0.0000 0.0000 0.0000 . .
Y y1 1 0.3365 0.5855 1.4841 0.8112 0.33 0.5655
Y y2 0 0.0000 0.0000 0.0000 0.0000 . .
Z z2 1 0.8267 0.4532 1.7149 0.0615 3.33 0.0681
Z z1 0 0.0000 0.0000 0.0000 0.0000 . .
X*Y x1 y1 1 0.0690 0.7878 1.6131 1.4751 0.01 0.9302
X*Y x1 y2 0 0.0000 0.0000 0.0000 0.0000 . .
X*Y x2 y1 0 0.0000 0.0000 0.0000 0.0000 . .
X*Y x2 y2 0 0.0000 0.0000 0.0000 0.0000 . .
X*Z x1 z2 1 1.4145 0.7187 0.0059 2.8230 3.87 0.0490
X*Z x1 z1 0 0.0000 0.0000 0.0000 0.0000 . .
X*Z x2 z2 0 0.0000 0.0000 0.0000 0.0000 . .
X*Z x2 z1 0 0.0000 0.0000 0.0000 0.0000 . .
Y*Z y1 z2 0 0.0000 0.0000 0.0000 0.0000 . .
Y*Z y2 z2 0 0.0000 0.0000 0.0000 0.0000 . .
Y*Z y2 z1 0 0.0000 0.0000 0.0000 0.0000 . .
Scale 0 1.0000 0.0000 1.0000 1.0000
Homogeneous Association with zero margin: (XY,XZ,YZ)
Also, notice that if you fit a model that does not require the YZ margin which has zero counts, then there are no issues in fitting the model and getting correct degrees of freedom. For example, fit a loglinear model of independence  the results will be the same if you use the original 2 × 2 × 2 table or the one where you remove the zero counts.
From  notice huge standard errors for some of the terms such as Yy2 : Zz2 below!
Call:
glm(formula = count ~ X + Y + Z + X * Y + X * Z + Z * Y, family = poisson(link = log))
Deviance Residuals:
1 2 3 4 5 6 7 8
8.988e06 0.000e+00 0.000e+00 2.581e08 1.664e05 0.000e+00 0.000e+00 1.490e08
Coefficients:
Estimate Std. Error z value Pr(>z)
(Intercept) 2.393e+01 4.535e+04 0.001 1.000
Xx2 1.232e+00 9.397e01 1.311 0.190
Yy2 2.554e+01 4.535e+04 0.001 1.000
Zz2 2.572e+01 4.535e+04 0.001 1.000
Xx2:Yy2 6.899e02 7.878e01 0.088 0.930
Xx2:Zz2 1.414e+00 7.187e01 1.968 0.049 *
Yy2:Zz2 2.514e+01 4.535e+04 0.001 1.000

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 3.7197e+01 on 7 degrees of freedom
Residual deviance: 3.5779e10 on 1 degrees of freedom
AIC: 37.101
As expected df = 1, but they really ought to be 0! Here, if we apply the above formula, the dfs should be:
df = (T_{e}  Z_{e})  (T_{p}  Z_{p})=(82)(71)=66=0
However, the estimates are huge and the model like this is useless. The best is to remove the cells with zero counts and refit the model, and this should compute the correct degrees of freedom too. In R, one way of doing this is:
count=c(0,6,5,9,0,5,16,7)
count=count[which(count!=0)]
X=X[which(count!=0)]
Y=Y[which(count!=0)]
Z=Z[which(count!=0)]
### Homogeneous Association with zero margin: (XY,XZ,YZ)
model=glm(count~X+Y+Z+X*Y+X*Z+Z*Y,family=poisson(link=log))
summary(model)
And the output looks like:
Call:
glm(formula = count ~ X + Y + Z + X * Y + X * Z + Z * Y, family = poisson(link = log))
Deviance Residuals:
[1] 0 0 0 0 0 0
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>z)
(Intercept) 1.7918 0.4082 4.389 1.14e05 ***
Xx2 0.9808 0.4787 2.049 0.0405 *
Yy2 0.4055 0.5270 0.769 0.4417
Zz2 0.1823 0.6055 0.301 0.7633
Xx2:Yy2 NA NA NA NA
Xx2:Zz2 0.6444 0.7563 0.852 0.3942
Yy2:Zz2 0.4055 0.8233 0.493 0.6224

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 9.5791e+00 on 5 degrees of freedom
Residual deviance: 1.3323e15 on 0 degrees of freedom
AIC: 35.101
Also, notice that if you fit a model that does not require the YZ margin which has zero counts, then there are no issues in fitting the model and getting correct degrees of freedom. For example, fit a loglinear model of independence  the results will be the same if you use the original 2 × 2 × 2 table or the one where you remove the zero counts.
The iterative algorithm for computing the MLE estimates of the model parameters may not converge, and the software output will typically give you a warning.
For example, in the .lst SAS file for PROC GENMOD under Criteria For Assessing Goodness Of Fit:
and, for example, in the .log SAS file for PROC GENMOD:
Look at the fit of the saturated model in ZerosEx.lst and ZerosEx.log
Here is another way to detect that there is an error...
The estimated standard errors of parameters with zero counts, and fitted counts, are huge in comparison to the rest of the estimates.
Notice the huge standard error, 139605.2 for \(\lambda_{111}^{XYZ}\) and for the corresponding fitted count.
An ad hoc solution: add .5 to each cell in the table
Adding .5 shrinks the estimated odds ratios that are 1 to finite values and increases estimates that are 0. However, for unsaturated models, adding .5 will oversmooth the data.
Be cautious! Keep in mind that an infinite estimate of a model parameter maybe OK, but an infinite estimate of a true odds ratio is unsatisfactory.
Suggested Solutions:
The devil is in the details! When you have special situations you have to pay special attention. The software solutions will differ depending if a called procedure treats the zero as a sampling or a structural zero. This can even differ depending on the version of the software that is used! If a sampling zero, the most common solution is to add a very small value (e.g. 1E20 (10 to the power of 20), to these cells.
You have to consult the manuals.
Some notes for SAS:
For PROC CATMOD: see the SAS links in the first part of this lesson. Note that the treatment of zeros with CATMOD also depends on which iterative algorithm is used for estimating the parameters (e.g. weighted least squares versus maximum likelihood).
In PROC GENMOD: If all possible combinations of categories of independent variables are listed, with the count variable taking value zero for the empty cells, then the zeros will be treated as sampling zeros. If only nonzero cells are included in the data set (that is we delete the empty cells from the data set), then the empty cells are treated as structural zeros (see the end of ZerosEx.sas [4]).
Again, you must check the help/description sections for implemented procedures in the help manuals!
Unfortunately there is no single rule that covers all situations. You have to be careful about application of these situationspecific rules.
So while the G^{2} for model fit may not be well approximated by the chisquared distribution, the difference between G^{2}s for two nested models will be closer to chisquared than the G^{2} for fit of either model. Chisquared comparison tests depend more on the size of marginals than on cell sizes in the joint table. So if margins have cells > 5, the chisquared approximation of G^{2}(M_{0})  G^{2}(M_{1}) should be reasonable.
The structural zeros (and partial crossclassifications) are not as common as sampling zeros. However, methods for incomplete tables can be useful for a variety of problems. These methods can be found in various chapters in Agresti(1996, 2002, 2007), Bishop, Holland and Fienberg (1975) and Fienberg (1980).
We remove the cell(s) from the model building and do the analysis by only fitting models to cells with nonstructural zeros.
We do this by filling in any number for a structural zero (the same value for the observed and the expected count); generally we just put in 0.
Then to account for fixing these values, and to remove these cells from the modeling, we create and indicator variable. To remove the (i, j) cell,
I_{ij} = 1 if cell is the structural zero
= 0 for all other cells
When this indicator is included in a loglinear model as a (numerical) explanatory variable, a single parameter is estimated for the structural zero, which used up 1 df, and the cell is fitted perfectly.
Since structural zeros are fitted perfectly, they have 0 weight in the fit statistics.
For example, consider the teens and health concerns data (see health.sas [7], health.lst [8] and/or health.R [9]).
Health Concern

Gender


Male

Female


Sex/Reproduction 
6

16

Menstrual problems 


12

How healthy am I? 
49

29

None 
77

102

We can express the saturated loglinear model as :
\begin{align}
\text{log}(\mu_{ij}) &= 0 & \text{for the (2,1) cell}\\
&= \lambda+\lambda_i^H+\lambda_j^G+\lambda_{ij}^{HG} & \text{for the rest}\\
\end{align}
A single equation for the saturated model:
\(\text{log}(\mu_{ij})=\lambda+\lambda_i^H+\lambda_j^G+\lambda_{ij}^{HG}+\delta_{21}I(2,1)\)
The δ_{21} is a parameter that will equal whatever it needs to equal such that the (2,1) cell is fit perfectly (i.e., the fitted value will be exactly equal to whatever arbitary constant you filled in for it).
If we just replace the missing value with the zero, the GENMOD will treat this as a sampling zero problem and will not properly adjust the degrees of freedom.
If we also include the indicator variable in model, the GENMOD will treat this as a sampling zero problem but will give the proper degrees of freedom. We need to make sure that the indicator variable is numerical variable and not categorical.
In health.lst output notice that the parameter estimates of the model and the fitted values are the same for the two cases above, but the chisquare statistics and the degrees of freedom for the significance of the individual parameters are not. For the calculation of the degrees of freedom, you can use the same formula as one given in the Sparse Table section of this lesson.
Here is the R code and output for the saturated model without the indicator variable:
And, here is the R code and output with the indicator variable:
The procedure on the previous slide is a general approach to this problem that can be applied with any software program.
Recall that different software packages, and even the same package within the different functions, would have implemented different default treatments of missing values and/or sampling zeros.
For example, if there are missing values present, the GENMOD by default will assume that this is the case of a structural zero (e.g. health1.sas [10]) . Under the Model Information it will clearly indicate the number of missing observations. Compare health.sas [7] and health.lst [8] to health1.sas [10] and health1.lst [11].
What would happened for an independence model?
How about degrees of freedom?
From health.lst:
What do you learn from the overall goodness of fit of this model? Are gender and health concerns independent?
What is the expected value ? Based on the health.lst
\(\hat{\mu}_{21}=\text{exp}(4.54662.06710.107622.9986)\approx 0\)
and based on health1.lst it is a large value,10.78. Here is a part of the output with the prediction from health.lst:
and from health1.lst:
Here is the R output for the independence model:
Here is the output from the independence model with delta:
And, here is the output from the independence model with missing data:
By analogy to twodimensional situations, we can consider loglinear models applied only to the nonmissing cells. That is create indicator variables for each missing cell.
Health 
Gender

Male

Female


Age

12 − 15

16 − 17

12 − 15

16 − 17


Sex/Reproduction 
4

2

9

7


Menstrual problems 




4

8


How healthy am I? 
42

7

19

10


None 
57

20

71

31

The table below gives loglikelihood ratios and Pearson chisquare values for various loglinear models. Note that the degrees of freedom are reduced by two from the usual degrees of freedom for a complete table, unless the gender by health margin is fitted.
See health3d.sas [12] (health3.R [13]) or health13d.sas [14].
Let H=health, G=gender, and A=age.
Model

df

G^{2}

X^{2}

(HG, HA, GA) 
2

2.03

2.03

(HG, HA) 
3

4.86

4.98

(HA, GA) 
4

13.45

13.12

(HG, GA) 
5

9.43

9.62

(HA, G) 
5

17.46

17.05

(HG, A) 
6

15.46

15.95

(H, GA) 
7

22.03

22.59

(H, G, A) 
8

28.24

30.53

Based on the overall goodnessoffit, the models in which \(\lambda_{ij}^{HG}=0\) are quite poor. The fit for models (a) \(\lambda_{ijk}^{HGA}=0\), (b) \(\lambda_{jk}^{GA}=\lambda_{ijk}^{HGA}=0\), and (c) \(\lambda_{ik}^{HA}=\lambda_{ijk}^{HGA}=0\) are each acceptable at the 0.05 alpha level.
By further comparison of these nested models, we notice that the difference between (a) and (b) is small, while between (a) and (c) is significant, e.g. 9.43 − 2.03 = 7.40, df = 5 − 2 = 3. This indicates that the model (b) is appropriate for this data with interpretation that given a particular health concern (other than menstrual problems), there is no relationship between the age and gender.
Consider a situation where overall a model fits a table well, except for one or a few cells. The methodology for incomplete tables can be used to show that the model really does fit well except for these few cells. But then, of course, you need to talk about the anomolous cells. For example, speculate why they are not being fitted well.
Example Table 810 from Fienberg(1980). Mothers of children under the age of 19 were asked whether boys, girls, or both should be required to shovel snow off sidewalks. The responses were crossclassified according to the year in which the question was asked (1953, 1971) and the religion of the mother. There are only two response, since none of the mothers said just girls.
1953

1971


Religion 
Boys

Both

Boys

Both

Protestant 
104

42

165

142

Catholic 
65

44

100

130

Jewish 
4

3

5

6

Other 
13

6

32

23

Let G = Gender, Y = Year, and R = Religion. Since we will treat gender is the response variable and year and religion are explanatory variables, all models should include \(\lambda_{ij}^{RY}\) terms in the loglinear model . Keep this in mind; we will see later that this corresponds to a logistic regression model.
The table below lists the four loglinear models all including \(\lambda_{ij}^{RY}\), and their fit statistics.
Model 
df

G^{2}

p

X^{2}

p

(RY, G) 
7

31.67

<.001

31.06

<.001

(RY, GY) 
6

11.25

.08

11.25

.08

(RY, GR) 
4

21.49

<.001

21.12

<.001

(RY, GY, GR) 
3

0.36

.95

.36

.95

Inference:
The homogeneous association model fits well.
How about the other models?
Compare the fit of (RY, GY) and the homogeneous association model: G^{2}[(RY, GY)(RY, GY, GR)] = 10.89, df = 3, pvalue = 0.01.
However, lets take a closer look at (RY, GY). May be the effect of religion on the response can be accounted for by a single religious category?
The Pearson residuals from the (RY, GY) loglinear model:
1953

1971


Religion

Boys

Both

Boys

Both

Protestant 
.75

1.05

.91

.91

Catholic 
.84

1.18

1.42

1.42

Jewish 
.29

.41

.22

.22

Other 
.12

.17

.85

.85

The 3 largest residuals correspond to mothers who are Catholic. The model underpredicts Both and
overpredicts Boys.
If we do not include Catholic mothers, would the model (RY, GY) (or the corresponding logit model) fit?
Since the fit of this model is G^{2} = 1.35, df = 4 we can conclude that (RY, GY) model fits well when we disregard the row with the data from Catholic mothers. Thus these seem to be anomalous cells.
What may be some other ways of dealing with this data? Do you think a different way of classification would help? Or a different model?
Rather than deleting Catholics, perhaps the effect of religion on the response can be accounted for by a single religious category. If so, then we can collapse the religion variable and get a more parsimonious and compact summary of the data.
To investigate this, we replace religion by a series of 4 binary variables
P = Protestant (i.e., P = 1 if Protestant, 0 otherwise).
C = Catholic (i.e., C = 1 if Catholic, 0 otherwise).
J = Jewish (i.e., J = 1 if Jewish, 0 otherwise).
O = Other (i.e., O =1 if not P, C, or J, and 0 otherwise).
Using all 4 variables (instead of just 3), we introduce redundancy in the data. This allows us to treate the 4 categories of religion symmetrically.
New display of the data in the form of a 6way, incomplete table:
Four Religion Variables

1953

1971


Protestant

Catholic

Jewish

Other

Boys

Both

Boys

Both

1

1

1

1









1

1

1

0









1

0

0

0

104

42

165

142

0

1

1

1









0

1

0

0

65

44

100

130

0

0

1

0

4

3

5

6

0

0

0

1

13

6

32

23

Since G (gender) is considered the response and the 4 religion variables and year are all considered explanatory variables, all loglinear models must include: λ^{YPCJO} terms (and lower order ones). This reduces the set models that we need to consider.
The table below gives the fit for some of the models:
Model

df

G^{2}


Fit previously  (YPCJO, GY) 
6

11.2

(YPCJO, GY, GPCJO) 
3

0.4


New ones  (YPCJO, GY, GO) 
5

9.8

(YPCJO, GY, GJ) 
5

10.9


(YPCJO, GY, GC) 
5

1.4


(YPCJO, GY, GP) 
5

4.8

The best fitting model (YPCJO, GY, GC) has a main effect for year (GY) and an effect of being a Catholic mother.
Thus, the interaction between religion and response is due primarily to Catholic mothers. In this example, we can collapse religion into a single dichotomous variable (Catholic, Not Catholic), and go with a simpler model.
Recall that under certain conditions, marginal associations and partial associations are the same (i.e., the partial odds ratios equal the marginal odds ratios).
For threeway tables (XYZ), XY marginal and partial odds ratios are identical if either
That is, the XY marginal and partial odds ratios are identical if either the
What would this mean in terms of graphical models (i.e. graphs)? How would you represent this graphically?
The collapsibility theorem for threeway tables (ref. Fienberg (1980)):
In a threedimensional table the interaction between two variables (as given by the appropriate λ terms) may be measured from the table of sums obtained by collapsing over the third variable if the third variable is independent (again as given by the appropriate λ terms) of at least one of the two variables exhibiting the interaction.
The generalization for higherdimensional tables from Agresti (2002), pg. 360,
Suppose that variables in a model for a multiway table partition into three exclusive subsets, A, B, and C, such that B separates A and C; thus, the model does not contain parameters linking variables from A with variables from C. When one collapses the table over the variables in C, model parameters relating variables in A and model parameters relating variables in A with variables in B are unchanged.
How would you represent this graphically? Think that every path (link) between a variable in the set A and a variable in the set C involves at least one variable in the set B.
For further details see: Agresti (2013), Section 9.1, or Agresti (2007), Section 7.4.
Key Concepts
Objectives

Earlier in the course we had described the ways to perform significance tests for independence and conditional independence, and to measure (linear) associations with ordinal categorical variables.
For example we focused on the CMH statistic and correlation measures for testing independence and linear associations for the example with the heart disease and cholesterol level. We concluded that the variables are not independent, and that the linear association was not strong either M^{2} = 26.15, df = 1 (see Lesson 4, "Ordinal data" section).
Serum cholesterol (mg/100 cc)

total


0–199

200–219

220–259

260+


CHD

12

8

31

41

92

no CHD

307

246

439

245

1237

total

319

254

470

286

1329

Can we answer the same questions (and more) via loglinear models?
Loglinear models for contingency tables, by default, treat all variables as nominal variables.
If there is an ordering of the categories of the variables, this is not taken into account. That means that we could rearrange the rows and/or columns of a table and we would get the same fitted odds ratios for the data as we would given the orginal ordering of the rows and/or columns.
To model ordinal data with loglinear models we can apply some of the general ideas we saw with incomplete tables and analysis of ordinal data from the earlier in the semester.
That is, we typically
This is the most common loglinear model when you have ordinal data.
Objective:
Modeling the log counts by accounting for the ordering of the categories of discrete variables.
Suppose we assign scores for the categories of the row variable, u_{1} ≤ u_{2} ≤ ... ≤ u_{I} , and for the categories of the column variable, v_{1} ≤ v_{2} ≤ ... ≤ v_{J} . These are numbers or values that you will use to describe the difference in magnitude between these variables. Then we can model the dependency between two variables, e.g. C = CHD, and S = Serum cholesterol.
Model Structure:
\(\text{log}(\mu_{ij})=\lambda+\lambda_i^C+\lambda_j^S+\beta u_i v_j\)
For each row i, the log fitted values are a linear function of the columns. For each column j, the log fitted values are a linear function of the rows.
Parameter estimates and interpretation:
This model only has one more parameter than the independence model (i.e., βu_{i}v_{j}), and is in between the independence and the saturated models by its complexity. We are trying to say something about the 'linear by linear association' by modeling this association between these two categories based on the scores that you have assigned.
\begin{align}
\theta(ij,i'j') &= \text{log}(\mu_{ij})+\text{log}(\mu_{i'j'})\text{log}(\mu_{i'j})\text{log}(\mu_{ij'})\\
&= \beta(u_iu_{i'})(v_jv_{j'})\\
\end{align}
Model Fit:
We use the G^{2} and ΔG^{2} as with any other loglinear model.
For our example see HeartDiseaseLoglin.sas [15] and the resulting output, HeartDiseaseLoglin.lst [16]. In this SAS program we create two new numerical variables as seen below in the datalines statement:
Here is the R code for this example, HeartDiseaseLoglin.R [17].:
We observe G^{2} = 4.09, df = 2, pvalue = 0.13 which indicates that the linear by linear association model fits well, and significantly better than the independence model where ΔG^{2} = 27.832, df = 1, pvalue < 0.001. Notice the equivalence of the values of the ΔG^{2}, M^{2}, and the likelihood ratio statistic for "xCHD*yserum" parameter under the significance testing for the individual parameters (e.g. 'Type 3 Analysis' output of GENMOD).
\(\hat{\beta}=0.574\) and exp(−0.574) = 0.56 means that the estimated odds ratio for a unit change in row and column scores of 'chdnochd' and '0199 – 200219' equal 0.56.
Linear by Linear Association Model 01:27 Tuesday, April 4, 2006 6
The GENMOD Procedure
Analysis Of Parameter Estimates
Likelihood Ratio
Standard 95% Confidence Chi
Parameter DF Estimate Error Limits Square
serum 220259 1 0.5937 0.2303 1.0563 0.1513 6.65
serum 260+ 0 0.0000 0.0000 0.0000 0.0000 .
xCHD*yserum 1 0.5740 0.1161 0.8089 0.3527 24.45
Scale 0 1.0000 0.0000 1.0000 1.0000
Look at the model fitted values ('Pred' from SAS "Observation Statistics" table or from using the"fitted" function in R):
\(\hat{\theta}=(8.16\times 242.69)/(11.31\times 310.84)=0.56\)
The cells in red:
Observation Statistics Observation count xCHD yserum CHD serum Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 12 1 1 chd 0199 8.1580998 2.0990113 0.2624486 8.1580998 4.8774452 13.64538 3.8419002 1.3450907 1.2560608 1.8977359 2.032248 1.9744497 2 8 1 2 chd 200199 11.308199 2.425528 0.1694369 11.308199 8.1127572 15.762257 3.308199 0.983773 1.038756 1.264002 1.197096 1.242676 3 31 1 3 chd 220259 35.909358 3.5809979 0.1111088 35.909358 28.882293 44.646109 4.909358 0.819258 0.839078 1.12459 1.098027 1.112893 Linear by Linear Association Model 01:27 Tuesday, April 4, 2006 7 The GENMOD Procedure Observation Statistics Observation count xCHD yserum CHD serum Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 4 41 1 4 chd 260+ 36.624367 3.6007138 0.146867 36.624367 27.463552 48.840886 4.3756334 0.7230292 0.7093048 1.5477726 1.5777207 1.5714785 5 307 2 1 nochd 0199 310.84191 5.7392845 0.0563922 310.84191 278.31617 347.16882 3.841913 0.21791 0.218361 2.036463 2.032255 2.032303 6 246 2 2 nochd 200199 242.69181 5.4917924 0.0631727 242.69181 214.42846 274.68049 3.3081921 0.2123553 0.2118756 1.1943894 1.1970937 1.1970087 7 439 2 3 nochd 220259 434.09065 6.0732534 0.0468783 434.09065 395.98388 475.86454 4.9093545 0.235632 0.2351899 1.0959661 1.0980261 1.0979313 8 245 2 4 nochd 260+ 249.37563 5.5189603 0.0623404 249.37563 220.6936 281.78528 4.375634 0.277086 0.277902 1.582369 1.577721 1.577864
The estimated odds of 'chd' and higher level of cholesterol, e.g. '260+' under this model are
\(\hat{\theta}=\text{exp}(0.574(21)(41))=0.18=\dfrac{8.16\times 249.37}{36.62\times 310.84}\)
We can use this evidence to conclude that a person is about 5.5 times more likely to have a heart condition with such a high cholesterol level.
There are many options for assigning the score values, and these can be of equal or unequal spacing.
The most common choice of scores are consecutive integers; that is u_{1} = 1, u_{2} = 2, ... u_{I} = I and v_{1} = 1, v_{2 } = 2, ... v_{J} = J (which is what we used in the above example).
The model with such scores is a special case of the linear by linear association model and is known as the Uniform Association Model. It is called the uniform association model because the odds ratios for any two adjacent rows and any two adjacent columns are the same and equal to
\(\theta=\text{exp}(\beta(u_iu_{i1})(v_jv_{j1}))=\text{exp}(\beta)\)
In other words, the Local Odds Ratio equals exp(β) and is the same for adjacent rows and columns.
Also, sets of scores with the same spacing between them will lead to the same goodnessoffit statistics, fitted counts, odds ratios, and \(\hat{\beta}\) .
For example, v_{1} = 1, v_{2 } = 2, v_{3} = 3, v_{4} = 4 and v_{1} = 8, v_{2} = 9, v_{3} = 10, v_{4} = 11 will yield the same results.
However, please note: Two sets of scores with the same relative spacing will lead to the same goodnessoffit statistics, fitted counts, and odds ratios, BUT different estimates of β.
For example, v_{1} = 1, v_{2} = 2v_{3} = 4, v_{4} = 8 and v_{1} = 2, v_{2} = 4, v_{3} = 8, v_{4} = 16
The choice of scores may highly depend on your data and the context of your problem. There are other ways of using and modeling ordinality, e.g. Cumlative logit models (ref. Agresti(2002), sec 7.2 and 7.3 , Agresti (2007), Sec. 6.2, and Agresti(1996), Sec. 8.2 and 8.3.; which has already been discussed.
For higherdimensions we already know how to test for associations and conditional independence with ordinal data, and combinations of ordinal and nominal, via CMH statistic (recall Lesson 4).
The modeling approach described today generalizes to higherdimensional tables as well. We can always create new variables representing the scores.
Association models are generalization of linear by linear association models for multiway tables.
We can also combine ordinal and nominal variables where we only assign the scores to the ordinal variables, and estimate scores from the data. Some of these models are known as row effects, column effect and row and column effects models. These are more advanced topics on this issues. For more information on some of these models see Agresti (2013), Sec. 9.59.6, Sec. 7.27.3, and Agresti (2007), Sec. 6.26.3.
Key Concepts
Objectives

Previously we discussed the concepts of matched data and square tables. We also described the ways to perform significance tests for models of marginal homogeneity, symmetry, and agreement.
For example we focused on the McNemar's test and Cohen's kappa statistic. For the movie ratings example, we concluded that the model of independence is not good (as expected) and that there is only a moderate agreement between Siskel and Ebert , κ = 0.39, sd = 0.0598 (see Lesson 4, "Dependent samples" section).
Siskel

Ebert

total


con

mixed

pro


con

24

8

13

45

mixed

8

13

11

32

pro

10

9

64

83

total

42

30

88

160

Can we answer the same questions (and more) via loglinear models?
We can always fit the loglinear models we are already familiar with, for example, the independence model.
But given the nature of matched or repeated measures data, we expect dependence. Thus there are additional models, special to these kinds of data. Some of those are:
A general idea: Fitting of these models typically require creation of additional variables, either indicator or other types of numerical variables to be included in the models that we are already familiar with such as an independence model.
Generalization of an independence model.
Objective:
Fit the loglinear model of independence only to offdiagonal cells.
Assumptions:
Independence model holds for the off diagonal cells.
Odds ratios for offdiagonal cells equal 1.
\(\theta(ij,i'j')=\dfrac{\mu_{ij}\mu_{i'j'}}{\mu_{ij'}\mu_{i'j}} \text{ for } i\neq j\text{ and } i'\neq j'\)
For our example, let S = classification by Siskel, and E = classification by Ebert.
Model structure:
\begin{align}
\text{log}(\mu_{ij}) &= \lambda+\lambda_i^S+\lambda_j^E & \text{ for }i\neq j\\
&= n_{ij} & \text{ for }i=j\\
\end{align}For a single equation, specify a numerical indicator variable for each of the diagonal cells:
\begin{align}
I(i=j) &= 1 & \text{ for }i\neq j\\
&=0 & \text{elsewhere}\\
\end{align}
\(\text{log}(\mu_{ij}) = \lambda+\lambda_i^S+\lambda_j^E+\delta_iI(i=j)\)
Model fit:
Use G^{2}, X^{2} as before. df = (usual df)  # of cells fitted perfectly= (I1)(I1)  I
For our example, G^{2} = 0.0061, df = 1, pvalue = 0.938
Thus, the quasiindependence model fits well, i.e., given a change of category, Ebert's rating is independent of Siskel's (and the other way around).
Parameter estimation and interpretation:
λ′s are interpreted as before.
Odds ratios involving only offdiagonal cells are 1 by the model assumption.
For the quasiindependence model δ parameter are linked to the odds summarizing agreement for categories. The odds summarizing agreement for categories a and b equal to
\(\tau_{ab}=\dfrac{\mu_{aa}\mu_{bb}}{\mu_{ab}\mu_{ba}}=\text{exp}(\delta_a+\delta_b)\)
For example, the estimated odds that Siskel's rating is category 'con' rather than 'mixed' are exp(0.96+0.62) = 4.71 times as high when the Ebert's rating is 'con' than when it is 'mixed'.
In general you need to create a separate indicator (dummy) variable for each diagonal cell. The indicator is treated as a numerical variable in the model.
Let's see how we can do this in SAS and R, see [21].
Take a look at the SAS code, (movies.sas [22], movies.lst [23]) for this example:
Part of the output:
You can find the R code for this example in movies.R [21].
### QuasiIndependence Model
model=glm(count~siskel+ebert+icon+imixed+ipro,family=poisson(link=log))
summary(model)
And, here is a part of the output that we are interested in:
Another way to fit this model is to create a variable that takes on a unique value for each of the diagonal cells and a common value for all of the off diagonal cells. For example,
\begin{align}
qi &=1 & i=j=1\\
&=2 & i=j=2\\
&=3 & i=j=3\\
&=4 & i\neq j\\
\end{align}
This new variable is treated as a nominal variable in fitting the model. Here is what this might look like if your were to do this in SAS with PROC GENMOD:
Do not forget you first need to declare and create this new variable labeled as "qi" in the SAS code in order for this to run. Can you modify movies.sas [22] and/or movies.R [21] to fit the quasiindependence model in this way? 
A symmetry model is important because testing symmetry is often the preliminary analysis on symmetric tables.
It is very restrictive because it has implications for both the association between the variables and the margins of the table.
The symmetry model rarely fits data very well. Example of a perfectly symmetric table:
1

2

3

Total


1

10

2

5

17

2

2

10

8

20

3

5

8

10

23

Total

17

20

23

Objective:
Fit the loglinear model of symmetry to test if a square table exhibits symmetry.
Assumptions:
The off diagonal cells have equal expected counts.
μ_{ij} = μ_{ij} for i ≠ j
= n_{ii} for i = jand row and column margins are equal, i.e., μ_{i+} = μ_{+i}
For our example, let S = classification by Siskel, and E = classification by Ebert.
Model structure:
log(μ_{ij}) = λ + λ_{i} + λ_{j} + λ_{ij}
where
λ_{ij} = λ_{ji} and λ_{i} = λ_{j} when i = j.
The second constraint indicates that the main/marginal effect terms are the same for the rows and columns; thus there are no superscripts.
Model fit:
Use G^{2}, X^{2} as before. df = (# cells)  (# nonredundant parameters) = # off diagonal cells  # unique parameters = I(I  1)  I(I  1)/2 = I(I  1)/2
G^{2} = 0.59, df = 3, pvalue=0.90
The symmetry model seems to fit this data well. This also implies strong agreement. Recall that only strong agreement is not enough to indicate symmetry!
Parameter estimation and interpretation:
The MLE estimates are \(\hat{\mu}_{ij}=\hat{\mu}_{ji}=\dfrac{n_{ij}+n_{ji}}{2}\)
λ′s are interpreted as before.
There are many different ways to fit these models. I will only discuss one here that comes from Agresti.
You need to create a variable that takes on a unique value for each diagonal cell and a unique value of each pair of cells. For example,
\begin{align}
symm &=1 & i=j=1\\
&=2 & i=j=2\\
&=3 & i=j=3\\
&=4 & (i,j)=(1,2)=(2,1)\\
&=5 & (i,j)=(1,3)=(3,1)\\
&=6 & (i,j)=(2,3)=(3,2)\\
\end{align}
This new variable is treated as a nominal variable in fitting the model.
See the first part of the movies.sas [22] code for the creation of the variable "symm":
Part of the output:
See the second part of the movies.R [21] code for the creation of variables "symm1", "symm2", "symm3", "symm4", "symm5" and "symm6". This is where we actually fit the symmetry model using this variable(s).
And, here is the output that we are interested in:
Less restrictive than the symmetry model, and more appropriate than the quasiindependence for measuring agreement with ordinal data.
Does not have a condition that the margins must be the same (e.g. no marginal homogeneity).
\(\text{log}(\mu_{ij}) = \lambda+\lambda_i^S+\lambda_j^E+\lambda_{ij}\)
where λ_{ij} = λ_{ji}
df = (#of cell)  (#non redundant parameters) = I^{2}−[1+(I−1)+(I−1)+I(I−1)/2] = (I−2)(I−1)/2
To fit this model modify the symmetry model. Add the row and column variable to the model with a symm (symmetry) variable. In our example,
For our example, G^{2} = 0.0061, df = 1, pvalue=0.93.
In SAS we have...
In R we have...
We have seen this when we discussed twoway tables:
Objective:
Are the row and column distributions (of a square table) the same?
H_{0}: μ_{i+} = μ_{+i}
There is no direct way to use loglinear models to fit/test this model.
To indirectly test using loglinear models (i.e., conditional likelihood ratio test), we need to do a contextual/comparision test .
Symmetry has two components: marginal homogeneity & quasisymmetry.
Symmetry is a special case of quasisymmetry. If the quasisymmetry relationship/model holds, then a way to test for marginal homogeneity is:
G^{2}(marginal homogeneity) = G^{2}(symmetry) − G^{2}(quasisymmetry) with df = I −1.
For our example, G^{2} = 0.587=0.59280.0061, df = 2, pvalue=0.746 homogeneity model fits moderately well. An alternative is to use generalized least squares instead of maximum likelihood estimation.
Here is how all the models discussed in this section relate to each other.
The most general (complex) model is quasisymmetry.
Symmetry is a special case of quasisymmetry.
Quasiindependence is a special case of quasisymmetry.
Symmetry is not a special case of quasiindependence.
Quasiindependence is not a special case of symmetry.
Summary of fits for our example (MovieCritiquesLoglin.sas [22] and MovieCritiquesLoglin.lst [23] or MoviesCritiquesLoglin.R [21]):
Model

df

G^{2}

pvalue

Independence 
4

43.2325

0.0001

QuasiIndependence 
1

0.0061

0.938

Symmetry 
3

0.5928

0.900

Marginal homogeneity 
2

0.587

0.746

Quasisymmetry 
1

0.0061

0.938

So do Siskel and Ebert really disagree or only moderately agree?
These models indicate strong symmetric associations, and agreement.
Knowing a set of basic models is useful as you can combine those to address more specific situations. We can combine models for matched and ordinal data, as well sampling and structural zeros.
For example, for a quasiindependence model for incomplete tables see the SAS example link
https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect44.htm [1]
and/or monkey.sas [24] example below or monkey.R [25].
In SAS we have ...
In R the code for this looks like this:
There are more special case models presented in Agresti(2013), ch. 11, and Agresti (2007), ch. 8.
Extensions to higher dimensions are possible by combining categories and/or collapsing categories and variables to get symmetric tables (if and when appropriate).
For more advance topics and models on how to deal with repeated measures data in higherdimensions via loglinear, logit and GLM models see Agresti(2013) ch. 11 and ch. 12, Agresti(2007), ch.9 and ch. 10, and Bishop, Holland and Fienberg (1975).
Links:
[1] https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect44.htm%20
[2] https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect33.htm
[3] https://www.dynamicdrive.com
[4] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/ZerosEx.sas
[5] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/ZerosEx.R
[6] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/ZerosEx.lst
[7] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health.sas
[8] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health.lst
[9] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health.R
[10] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health1.sas
[11] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health1.lst
[12] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health3d.sas
[13] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health3.R
[14] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/health13d.sas
[15] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/HeartDiseaseLoglin.sas
[16] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/HeartDiseaseLoglin.lst
[17] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/HeartDiseaseLoglin.R
[18] https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect46.htm%20
[19] https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect48.htm
[20] https://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/catmod_sect44.htm
[21] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/movies.R
[22] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/movies.sas
[23] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/movies.lst
[24] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/monkey.sas
[25] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson05/monkey.R