10.1.3  Example: General Social Survey
Crossclassification of respondents according to choice for the president in 1992 presidential election (Bush, Clinton, Perot) and political view on the 7 point scale (extremely liberal, liberal, slightly liberal, moderate, slightly conservative, conservative, extremely conservative).
Let's group the 7 point scale to 3 levels, and consider a 3 × 3 table:
Bush

Clinton

Perot

Total


Liberal 
70

324

56

450

Moderate 
195

332

101

628

Conservative 
382

199

117

698

Total

647

855

274

1774

Are political view and choice independent? You already know how to do this with the chisquared test of independence. For example, see vote.sas (output: vote.lst ) or vote.R. (output: (viewlet for Vote.R)), and compare the results of chisquared tests with the loglinear model output.
Here is part of the output from SAS PROC FREQ ...
and a part from the output of the PROC GENMOD procedure.
Question: Can you identify the X^{2} and G^{2} statistics in these two outputs? What is your conclusion about independence? Does the model fit well or not? Can you identify the loglinear independence model and relevant part of the output? Note that if "Value/df" is much greater than 1, you have sufficient evidence to reject the model. 
______________________________________________________
Here is what the loglinear model would be:
\(\text{log}(\mu_{ij})=\lambda+\lambda_i^{pview}+\lambda_j^{choice}\)
Recall, that we assume the counts are coming from a Poisson distribution and that all odds ratios are equal to 1. The explicit assumption is that the interaction term is equal to 0. Our null hypothesis is that the above specified loglinear model of independence fits versus the alternative that the saturated model fits.
What is the LR test statistic? Identify it in the PROC FREQ, PROC GENMOD and PROC CATMOD part of the output. Notice, that in GENMOD, "conservative" and "perot" are reference levels. Thus equation for each "Choice" is:
Bush: log(μ_{i1}) = λ + λ_{i}^{pview} + λ_{1}^{choice}
Clinton: log(μ_{i2}) = λ + λ_{i}^{pview} + λ_{2}^{choice}
Perot: log(μ_{i3}) = λ + λ_{i}^{pview}  λ_{1}^{choice}  λ_{2}^{choice }in CATMOD. Notice in GENMOD with dummy coding this would be: log(μ_{i3}) = λ + λ_{i}^{pview} +λ_{3}^{choice}
If we want, for example, probability of being a liberal and voting for Bush:
LibBush: log(μ_{11}) = λ + λ_{1}^{pview} + λ_{1}^{choice} =4.680.439+0.859
How about the odds? We would look at the difference of the above equations; also see the previous page on general set up of parameters. For example,
BushClinton: λ + λ_{i}^{pview} + λ_{1}^{choice}  (λ + λ_{i}^{pview} + λ_{2}^{choice}) = λ_{1}^{choice} λ_{2}^{choice}
BushPerot: 2 λ_{1}^{choice}+λ_{2}^{choice }in CATMOD. Notice in GENMOD with dummy coding this would be: λ_{1}^{choice } λ_{3}^{choice}
ClintonPerot: λ_{1}^{choice}+2 λ_{2}^{choice } in CATMOD. Notice in GENMOD with dummy coding this would be: λ_{2}^{choice}  λ_{3}^{choice}
Think about the following question, then click on the icon to the left display an answer. Who had better odds based on this data, Bush or Clinton, e.g. 647/855? 
CATMOD: \(\text{exp}(\lambda_1^B\lambda_2^B)=\text{exp}(0.19350.4722)=0.756\)
GENMOD: \(\text{exp}(0.85921.1380)=0.756\)R, glm(): exp(pchoiceclinton)=exp(0.27876)=1.3215 which are the odds of Clinton vs. Bush, so Bush vs. Clinton is 1/1.3215=0.756
Think about the following question, then click on the icon to the left display an answer. How about Bush vs. Perot, e.g. 647/274? 
CATMOD: \(\text{exp}(2\lambda_1^B+\lambda_2^B)=\text{exp}(2\times 0.1935+0.4722)=2.361\)
GENMOD: \(\text{exp}(\lambda_1^B)=\text{exp}(0.8592)=2.361\)R, glm(): 1/exp(pchoiceperot)=exp(pchoiceperot)=exp(0.86992)=2.361
Question: Why doesn't the model fit well? Do you see any unusual residuals? 
Here is the output from GENMOD in SAS. The highlighted row are the possible residulas values as we dicussed earlier. Let's look at the standardized Perason residulas; recall they have approximate N(0,1) distribution, so we are looking for the absolute values which are greater than 2 or 3. Notice, for example, for the first cell, the value is 10.649., clearly a large residuls. Look at the other cells.
In R, glm() there are a number of ways to get the residuals, but one would be to use residuals() function, and to specify the type we want, e.g., the following code produces pearson residuals, the standardized (adjusted) residulas, with the formatted output. Notice the same values for the fist cell as we got from GENMOD above.
resids < residuals(vote.ind,type="pearson")
h < lm.influence(vote.ind)$hat
adjresids < resids/sqrt(1h)
round(cbind(count,fits,adjresids),2)
count fits adjresids
1 70 163.94 10.65
2 324 216.64 11.72
3 56 69.43 2.03
4 195 228.78 3.48
5 332 302.33 2.95
6 101 96.89 0.57
7 382 254.28 12.89
8 199 336.03 13.32
9 117 107.69 1.25