IPS Interview Task

STAT 504: 10.1.3 - Example: General Social Survey Section

Cross-classification of respondents according to choice for the president in 1992 presidential election (Bush, Clinton, Perot) and political view on the 7 point scale (extremely liberal, liberal, slightly liberal, moderate, slightly conservative, conservative, extremely conservative).

Let's group the 7 point scale to 3 levels, and consider a 3 × 3 table:

 
Bush
Clinton
Perot
Total
Liberal
70
324
56
450
Moderate
195
332
101
628
Conservative
382
199
117
698
Total
647
855
274
1774

Are political view and choice independent? You already know how to do this with the chi-squared test of independence. For example, see /onlinecourses/sites/stat504/files/lesson05/vote.sas (output: vote.lst ) or vote.R., and compare the results of chi-squared tests with the log-linear model output.

Here is part of the output from SAS PROC FREQ ...

SAS output

and a part from the output of the PROC GENMOD procedure.

SAS output

Question: Can you identify the X2 and G2 statistics in these two outputs? What is conclusion about independence? Does the model fit well or not? Can you identify the log-linear independence model and relevant part of the output? Note that if "Value/df" is much greater than 1, you have sufficient evidence to reject the model.

Here is what the log-linear model would be:

\(\text{log}(\mu_{ij})=\lambda+\lambda_i^{pview}+\lambda_j^{choice}\)

Recall, that we assume the counts are coming from a Poisson distribution and that all odds ratios are equal to 1. The explicit asumption is that the interaction term is equal to 0. Our null hypothesis is that the above specified loglinear model of independence fits versus the alternative that the saturated model fits.

What is the LR test statistic? Identify it in the PROC FREQ, PROC GENMOD and PROC CATMOD part of the output. Notice, that in GENMOD, "conservative" and "perot" are reference levels. Thus equation for each "Choice" is:

Bush: logi1) = λ + λipview + λ1choice

Clinton: logi2) = λ + λipview + λ2choice

Perot: logi3) = λ + λipview - λ1choice - λ2choice in CATMOD. Notice in GENMOD with dummy coding this would be: logi3) = λ + λipview3choice

If we want, for example, probability of being a liberal and voting for Bush:

Lib-Bush: log11) = λ + λ1pview + λ1choice =4.68-0.439+0.859

How about the odds? We would look at the difference of the above equations; also see the previous page on general set up of parameters. For example,

Bush-Clinton: λ + λipview + λ1choice - (λ + λipview + λ2choice) = λ1choice- λ2choice

Bush-Perot: 2 λ1choice2choice in CATMOD. Notice in GENMOD with dummy coding this would be: λ1choice - λ3choice

Clinton-Perot: λ1choice+2 λ2choice in CATMOD. Notice in GENMOD with dummy coding this would be: λ2choice - λ3choice

Think about it!
Who had better odds based on this data, Bush or Clinton, e.g. 647/855?

Clinton has better odds!

CATMOD: \(\text{exp}(\lambda_1^B-\lambda_2^B)=\text{exp}(0.1935-0.4722)=0.756\)
GENMOD: \(\text{exp}(0.8592-1.1380)=0.756\)

R, glm(): exp(pchoiceclinton)=exp(0.27876)=1.3215 which are the odds of Clinton vs. Bush, so Bush vs. Clinton is 1/1.3215=0.756

Think about it!
How about Bush vs. Perot, e.g. 647/274?

Bush has twice the odds of Perot of being picked for the presidency.

CATMOD: \(\text{exp}(2\lambda_1^B+\lambda_2^B)=\text{exp}(2\times 0.1935+0.4722)=2.361\)
GENMOD: \(\text{exp}(\lambda_1^B)=\text{exp}(0.8592)=2.361\)

R, glm(): 1/exp(pchoiceperot)=exp(-pchoiceperot)=exp(0.86992)=2.361

asdasdasd 

Question: Why doesn't the model fit well? Do you see any unusual residuals?

Here is the output from GENMOD in SAS. The highlighted row are the possible residulas values as we dicussed earlier. Let's look at the standardized Perason residulas; recall they have approximate N(0,1) distribution, so we are looking for the absolute values which are greater than 2 or 3. Notice, for example, for the first cell, the value is -10.649., clearly a large residuls. Look at the other cells.

In R, glm() there are a number of ways to get the residuals, but one would be to use residuals() function, and to specify the type we want, e.g., the following code produces pearson residuals, the standardized (adjusted) residulas, with the formatted output. Notice the same values for the fist cell as we got from GENMOD above.

resids <- residuals(vote.ind,type="pearson")
h <- lm.influence(vote.ind)$hat
adjresids <- resids/sqrt(1-h)
round(cbind(count,fits,adjresids),2)

  count   fits adjresids
1    70 163.94    -10.65
2   324 216.64     11.72
3    56  69.43     -2.03
4   195 228.78     -3.48
5   332 302.33      2.95
6   101  96.89      0.57
7   382 254.28     12.89
8   199 336.03    -13.32
9   117 107.69      1.25