Printer-friendly versionPrinter-friendly version

The dataset for the example in this section comes from the General Social Survey for 1972, 1973, and 1974. Caucasian Christian respondents were classified by years of education and religious group (Catholic, Southern Protestant, Other Protestant). Attitudes toward abortion were determined by whether the respondent thought that abortions should be made legal

  1. when there is a strong possibility of birth defect,
  2. when the mother's health is threatened, and
  3. when the pregnancy is the result of rape.

Negative responses to all three questions were coded as "Negative," positive responses to all three questions were coded as "Positive," and any other pattern of response was coded as "Mixed." Naturally Attitude, as defined here, has an inherent ordering as shown below.

table

sas logoWe will use the SAS program abortion.sas to fit a proportional-odds cumulative logit model with main effects for year (2 dummies), religion (2 dummies) and education (2 dummies). With three response categories, there are two logit equations. Therefore, the model has 2 intercepts plus 6 slopes for a total of 8 parameters. The saturated model, which fits a separate 3-category multinomial response to each of the 3 × 3 × 3 = 27 response profiles, has 27 × (3 − 1) = 54 free parameters. Code for fitting the main-effects-only model is shown below.

SAS program

The score test for unequal slopes does not reject the null hypothesis:

SAS output

(Notice that this tests the current model, which has 6 slopes, against an alternative model with 6 × 2 = 12 slopes.) The data do not provide strong evidence that the lines are not parallel. However, the model does not seem to fit:

SAS output

What's going on? Most likely, we have omitted covariates.

Each of the covariates is significant, but the effects of religion and education appear to be much more powerful than year:

SAS output

If we are going to consider interactions, it makes sense to start with the interaction between religion and education.

Adding the interaction of rel and edu produces this result:

SAS output

The interaction between rel and edu is highly significant, and adding it to the model improves the fit substantially. The parameter estimates are:

SAS output

Note that we did not use the option descending in order=data. That will give us the following logit equations.

First, let us interpret the effect of year. The first logit equation predicts

\(L_1=\text{log}\dfrac{P(\text{Negative})}{P(\text{Mixed or Positive})}\)

and the second logit equation predicts

\(L_2=\text{log}\dfrac{P(\text{Negative or Mixed})}{P(\text{Positive})}\)

Therefore, a positive coefficient for an X-variable indicates a tendency for attitudes toward abortion to become more negative as X increases. The estimated coefficient for the year 1973 dummy is −0.2282, and the estimated coefficient for the year 1974 dummy is almost the same, −0.2410. This means that negative attitudes toward abortion appear to have decreased from 1972 to 1973, but remained nearly unchanged from 1973 to 1974.

Interpreting the effects of religion and education is more tricky because of the presence of interactions. Interaction indicates that the effect of education is not constant across the religious groups, and the effect of religion is not constant across education groups.

Let us consider the effect of increasing education. The coefficient for the edu Med dummy is −0.7504, and the coefficient for the edu High dummy is −1.3685. These two coefficients estimate the effect of education within the reference group for religion, which is Protestant.

Among Protestants, increasing education is associated with decreasing negative attitudes toward abortion. Among Southern Protestants, the effect of going from low to medium education is obtained by adding the main effect −0.7504 to the interaction −0.2526. The effect of going from low to high education is −1.3685 − 0.3857. Therefore, the estimated effects of education for Southern Protestants are in the same direction as for Protestants but are somewhat larger. (Note, however, that the two coefficients −0.2526 and −0.3857 are not significantly different from zero, which means that the effect of education among Protestants and Southern Protestants is not significantly different.)

When we move to Catholics, however, the trend is a bit different. Among Catholics, the estimated effect of moving from low education to medium education is −0.7504 + 0.3891, and the estimated effect of moving from low education to high education is −1.3685 + 0.9437. Both of these are less than zero, so increasing education is still associated with more positive attitudes toward abortion. However, these effects are smaller than they are among Protestants and Southern Protestants.

R LogoWe can use the R program abortion.R (with dataset abortion.txt) to fit a proportional-odds cumulative logit model with main effects for year (2 dummies), religion (2 dummies) and education (2 dummies). With three response categories, there are two logit equations. Therefore, the model has 2 intercepts plus 6 slopes for a total of 8 parameters. The saturated model, which fits a separate 3-category multinomial response to each of the 3 × 3 × 3 = 27 response profiles, has 27 × (3 − 1) = 54 parameters that could be estimated. Code for fitting the main-effects-only model is first shown below; please note that there are many other ways to enter the data and to fit this model in R.

abortion <- read.table( "abortion.txt", col.names=c("year", "rel", "edu", "att", "count") )

# see if att is a factor
is.factor( abortion\$att )

# make it into an ordered factor in the order you want
abortion\$att <- factor( abortion\$att, levels=c("Neg", "Mix", "Pos") )
abortion\$att

The above specify the order of levels for the logit equations. For the other factors see the R code.

Here is a way to fit the main effects model and compare it to the saturated model

# fit the main effects proportional-odds logistic regression model
result = polr( att ~ year+rel+edu, weights=count, data=abortion )
summary(result)
anova(result, satmodel)

Call:
polr(formula = att ~ year + rel + edu, data = abortion, weights = count)

Coefficients:
           Value Std. Error t value
year1973  0.2210     0.1048   2.109
year1974  0.2331     0.1059   2.203
relSProt -0.2489     0.1135  -2.194
relCath  -0.7960     0.1003  -7.934
eduMed    0.7165     0.1087   6.592
eduHigh   1.1267     0.1282   8.790

Intercepts:
        Value    Std. Error t value
Neg|Mix  -2.3311   0.1350   -17.2698
Mix|Pos  -0.7920   0.1220    -6.4912

Residual Deviance: 4059.316
AIC: 4075.316

From the above we see that all predictors seem to be significant (consider the t-values like in the cheese example) but the output below when we do the likelihood ratio test of this model versus the saturated model shows that this model still fits poorly in comparison to the saturated model.

> anova(result, satmodel)
Likelihood ratio tests of ordinal regression models


Response: att
                Model Resid. df Resid. Dev   Test    Df LR stat.     Pr(Chi)
1                                         year + rel + edu      3229  4059.316                                 
2 year + rel + edu + year:rel + year:edu + rel:edu + year:rel:edu      3209   4018.273 1 vs 2    20 41.04359 0.003677518

From the table of coefficients, and if we fit other submodels of the main effects model, we can also see that the effects of religion and education appear to be much more powerful than the year.  It would then make sense to consider including the term for interaction between rel and edu. Furthermore, if we run stepAIC(satmodel) function in abortion.R, which does a stepwise search for the "best" model starting from the saturated model, we obtain a model with interaction between rel and edu. Below is the summary of that model:

result1=polr( att ~ year+rel+edu+rel:edu, weights=count, data=abortion )
summary(result1)

Call:
polr(formula = att ~ year + rel + edu + rel:edu, data = abortion,
    weights = count)

Coefficients:
                   Value Std. Error t value
year1973          0.2281     0.1051  2.1712
year1974          0.2410     0.1061  2.2708
relSProt         -0.4499     0.2132 -2.1097
relCath          -0.3477     0.2235 -1.5561
eduMed            0.7504     0.1774  4.2295
eduHigh           1.3689     0.2192  6.2442
relSProt:eduMed   0.2526     0.2671  0.9458
relCath:eduMed   -0.3892     0.2601 -1.4960
relSProt:eduHigh  0.3852     0.3362  1.1457
relCath:eduHigh  -0.9442     0.3066 -3.0790

Intercepts:
        Value    Std. Error t value
Neg|Mix  -2.2582   0.1701   -13.2781
Mix|Pos  -0.7162   0.1596    -4.4884

Residual Deviance: 4040.437
AIC: 4064.437

The interaction between rel and edu is highly significant, and adding it to the model improves the fit substantially; see the output below.

> anova(result1, result)
Likelihood ratio tests of ordinal regression models

Response: att
                       Model Resid. df Resid. Dev   Test    Df LR stat.      Pr(Chi)
1           year + rel + edu      3229   4059.316                                  
2 year + rel + edu + rel:edu      3225   4040.437 1 vs 2     4  18.8794 0.0008300017

The parameter estimates are given in the table of coefficients and intercepts.

First, let us interpret the effect of year. The first logit equation predicts

\(L_1=\text{log}\dfrac{P(\text{Negative})}{P(\text{Mixed or Positive})}\)

and the second logit equation predicts

\(L_2=\text{log}\dfrac{P(\text{Negative or Mixed})}{P(\text{Positive})}\)

To interpret the coefficients think about constructing a 2x2 table with columns  for example "supportive" attitude versus "less supportive" attitude of legalizing abortion, and rows, year 73 and year 72:

  "supportive "less supportive"
year 1973    
year 1972    

Therefore, a positive coefficient for an X-variable indicates a tendency for attitudes toward supporting the legalization of abortion to become more positive as X increases. The estimated coefficient for the year 1973 dummy is 0.2282, and the estimated coefficient for the year 1974 dummy is almost the same, 0.2410. This means that negative attitudes toward abortion appear to have decreased from 1972 to 1973, but remained nearly unchanged from 1973 to 1974.

Interpreting the effects of religion and education is more tricky because of the presence of interactions. Interaction indicates that the effect of education is not constant across the religious groups, and the effect of religion is not constant across education groups.

Let us consider the effect of increasing education. The coefficient for the edu Med dummy is 0.7504, and the coefficient for the edu High dummy is 1.3685. These two coefficients estimate the effect of education within the reference group for religion, which is Protestant.

Among Protestants, increasing education is associated with decreasing negative attitudes toward abortion. Among Southern Protestants, the effect of going from low to medium education is obtained by adding the main effect 0.7504 to the interaction 0.2526. The effect of going from low to high education is 1.3685 + 0.3857. Therefore, the estimated effects of education for Southern Protestants are in the same direction as for Protestants but are somewhat larger. (Note, however, that the two coefficients 0.2526 and 0.3857 are not significantly different from zero, which means that the effect of education among Protestants and Southern Protestants is not significantly different.)

When we move to Catholics, however, the trend is a bit different. Among Catholics, the estimated effect of moving from low education to medium education is 0.7504 - 0.3891, and the estimated effect of moving from low education to high education is 1.3685 - 0.9437. Both of these are less than zero, so increasing education is still associated with more positive attitudes toward abortion. However, these effects are smaller than they are among Protestants and Southern Protestants.