IPS Interview Task | STAT ONLINE

STAT 504: 10.1.3 - Example: General Social Survey Section

Cross-classification of respondents according to choice for the president in 1992 presidential election (Bush, Clinton, Perot) and political view on the 7 point scale (extremely liberal, liberal, slightly liberal, moderate, slightly conservative, conservative, extremely conservative).

Let's group the 7 point scale to 3 levels, and consider a 3 × 3 table:

	Bush	Clinton	Perot	Total
Liberal	70	324	56	450
Moderate	195	332	101	628
Conservative	382	199	117	698
Total	647	855	274	1774

Are political view and choice independent? You already know how to do this with the chi-squared test of independence. For example, see /onlinecourses/sites/stat504/files/lesson05/vote.sas (output: vote.lst ) or vote.R., and compare the results of chi-squared tests with the log-linear model output.

Here is part of the output from SAS PROC FREQ ...

SAS output

and a part from the output of the PROC GENMOD procedure.

SAS output

Question: Can you identify the X² and G² statistics in these two outputs? What is conclusion about independence? Does the model fit well or not? Can you identify the log-linear independence model and relevant part of the output? Note that if "Value/df" is much greater than 1, you have sufficient evidence to reject the model.

Here is what the log-linear model would be:

\(\text{log}(\mu_{ij})=\lambda+\lambda_i^{pview}+\lambda_j^{choice}\)

Recall, that we assume the counts are coming from a Poisson distribution and that all odds ratios are equal to 1. The explicit asumption is that the interaction term is equal to 0. Our null hypothesis is that the above specified loglinear model of independence fits versus the alternative that the saturated model fits.

What is the LR test statistic? Identify it in the PROC FREQ, PROC GENMOD and PROC CATMOD part of the output. Notice, that in GENMOD, "conservative" and "perot" are reference levels. Thus equation for each "Choice" is:

Bush: log(μ_i1) = λ + λ_i^pview + λ₁^choice

Clinton: log(μ_i2) = λ + λ_i^pview + λ₂^choice

Perot: log(μ_i3) = λ + λ_i^pview - λ₁^choice - λ₂^choicein CATMOD. Notice in GENMOD with dummy coding this would be: log(μ_i3) = λ + λ_i^pview +λ₃^choice

If we want, for example, probability of being a liberal and voting for Bush:

Lib-Bush: log(μ₁₁) = λ + λ₁^pview + λ₁^choice =4.68-0.439+0.859

How about the odds? We would look at the difference of the above equations; also see the previous page on general set up of parameters. For example,

Bush-Clinton: λ + λ_i^pview + λ₁^choice - (λ + λ_i^pview + λ₂^choice) = λ₁^choice- λ₂^choice

Bush-Perot: 2 λ₁^choice+λ₂^choicein CATMOD. Notice in GENMOD with dummy coding this would be: λ₁^choice- λ₃^choice

Clinton-Perot: λ₁^choice+2 λ₂^choice in CATMOD. Notice in GENMOD with dummy coding this would be: λ₂^choice - λ₃^choice

Think about it!

Who had better odds based on this data, Bush or Clinton, e.g. 647/855?

Clinton has better odds!

CATMOD: \(\text{exp}(\lambda_1^B-\lambda_2^B)=\text{exp}(0.1935-0.4722)=0.756\)
GENMOD: \(\text{exp}(0.8592-1.1380)=0.756\)

R, glm(): exp(pchoiceclinton)=exp(0.27876)=1.3215 which are the odds of Clinton vs. Bush, so Bush vs. Clinton is 1/1.3215=0.756

Think about it!

How about Bush vs. Perot, e.g. 647/274?

Bush has twice the odds of Perot of being picked for the presidency.

CATMOD: \(\text{exp}(2\lambda_1^B+\lambda_2^B)=\text{exp}(2\times 0.1935+0.4722)=2.361\)
GENMOD: \(\text{exp}(\lambda_1^B)=\text{exp}(0.8592)=2.361\)

R, glm(): 1/exp(pchoiceperot)=exp(-pchoiceperot)=exp(0.86992)=2.361

asdasdasd

Question: Why doesn't the model fit well? Do you see any unusual residuals?

Here is the output from GENMOD in SAS. The highlighted row are the possible residulas values as we dicussed earlier. Let's look at the standardized Perason residulas; recall they have approximate N(0,1) distribution, so we are looking for the absolute values which are greater than 2 or 3. Notice, for example, for the first cell, the value is -10.649., clearly a large residuls. Look at the other cells.

In R, glm() there are a number of ways to get the residuals, but one would be to use residuals() function, and to specify the type we want, e.g., the following code produces pearson residuals, the standardized (adjusted) residulas, with the formatted output. Notice the same values for the fist cell as we got from GENMOD above.

resids <- residuals(vote.ind,type="pearson")
h <- lm.influence(vote.ind)$hat
adjresids <- resids/sqrt(1-h)
round(cbind(count,fits,adjresids),2)

  count   fits adjresids
1    70 163.94    -10.65
2   324 216.64     11.72
3    56  69.43     -2.03
4   195 228.78     -3.48
5   332 302.33      2.95
6   101  96.89      0.57
7   382 254.28     12.89
8   199 336.03    -13.32
9   117 107.69      1.25