Printer-friendly versionPrinter-friendly version

Generalization of an independence model.

Objective:

Fit the loglinear model of independence only to off-diagonal cells.

Assumptions:

Independence model holds for the off diagonal cells.

Odds ratios for off-diagonal cells equal 1.

\(\theta(ij,i'j')=\dfrac{\mu_{ij}\mu_{i'j'}}{\mu_{ij'}\mu_{i'j}} \text{ for } i\neq j\text{ and } i'\neq j'\)

For our example, let S = classification by Siskel, and E = classification by Ebert.

Model structure:

\begin{align}
\text{log}(\mu_{ij}) &= \lambda+\lambda_i^S+\lambda_j^E & \text{ for }i\neq j\\
&= n_{ij} & \text{ for }i=j\\
\end{align}

For a single equation, specify a numerical indicator variable for each of the diagonal cells:

\begin{align}
I(i=j) &= 1 & \text{ for }i\neq j\\
&=0 & \text{elsewhere}\\
\end{align}
\(\text{log}(\mu_{ij}) = \lambda+\lambda_i^S+\lambda_j^E+\delta_iI(i=j)\)

Model fit:

Use G2, X2 as before. df = (usual df) - # of cells fitted perfectly= (I-1)(I-1) - I

For our example, G2 = 0.0061, df = 1, p-value = 0.938

Thus, the quasi-independence model fits well, i.e., given a change of category, Ebert's rating is independent of Siskel's (and the other way around).

Parameter estimation and interpretation:

λ′s are interpreted as before.

Odds ratios involving only off-diagonal cells are 1 by the model assumption.

For the quasi-independence model δ parameter are linked to the odds summarizing agreement for categories. The odds summarizing agreement for categories a and b equal to

\(\tau_{ab}=\dfrac{\mu_{aa}\mu_{bb}}{\mu_{ab}\mu_{ba}}=\text{exp}(\delta_a+\delta_b)\)

For example, the estimated odds that Siskel's rating is category 'con' rather than 'mixed' are exp(0.96+0.62) = 4.71 times as high when the Ebert's rating is 'con' than when it is 'mixed'.

In general you need to create a separate indicator (dummy) variable for each diagonal cell. The indicator is treated as a numerical variable in the model.

Let's see how we can do this in SAS and R, see.

sas logoTake a look at the SAS code, (movies.sas, movies.lst) for this example:

SAS program

Part of the output:

SAS output

R LogoYou can find the R code for this example in movies.R.

### Quasi-Independence Model
model=glm(count~siskel+ebert+icon+imixed+ipro,family=poisson(link=log))
summary(model)

And, here is a part of the output that we are interested in:

r output

Another way to fit this model is to create a variable that takes on a unique value for each of the diagonal cells and a common value for all of the off diagonal cells. For example,

\begin{align}
qi &=1 & i=j=1\\
&=2 & i=j=2\\
&=3 & i=j=3\\
&=4 & i\neq j\\
\end{align}

This new variable is treated as a nominal variable in fitting the model.  Here is what this might look like if your were to do this in SAS with PROC GENMOD:

SAS program

Discuss      Do not forget you first need to declare and create this new variable labeled as "qi" in the SAS code in order for this to run. Can you modify movies.sas and/or movies.R to fit the quasi-independence model in this way?