8.5 - Fitting the Cumulative Logit Model

Example: Housing Satisfaction Revisited Section

Suppose we revisit the housing satisfaction example but now treat the response levels "Low", "Medium", and "High" as ordered from least to greatest satisfaction (indexed 1, 2, and 3) so that it \(\pi_1+\pi_2\) is the cumulative probability of medium satisfaction---that is, the probability of medium or less satisfaction, which makes sense now that the categories have a defined order. The cumulative logit model is then

\(\log\left(\dfrac{P(Y \leq j)}{P(Y > j)}\right)= \beta_{0j}+\beta_{1j}x_1+\beta_{2j}x_2+\cdots \quad\mbox{for }j=1,2\)

where the predictor set is identical to that for the baseline model. Although the odds are defined differently, this model is very similar to the baseline model in that each of \(3-1=2\) response categories gets its own logit equation (each of the \(\beta\) coefficients has indices for both its predictor and the \(j^{th}\) category), which yields the same total number of parameters and degrees of freedom for goodness of fit. In fact, if we compute the deviance statistic for the additive model (14 total parameters) against the saturated model (48 parameters), we get \(G^2=39.1570\) on 34 degrees of freedom, which is nearly identical to the test we carried out earlier with the baseline logit model (\(G^2=38.6622\)). However, if we have ordered response categories, we would prefer to work with the cumulative logit model because it allows us to consider the proportional-odds assumption, which simplifies the model as we've seen.

This model can be fit in SAS using PROC LOGISTIC as with the baseline model; we just need to change the response as ordered with the order=data option. By default, SAS will assume proportional odds, but we can temporarily disable that with the unequalslopes option.

* ordinal, not proportional odds;
* additive model;
proc logistic data=housing;
freq freq; 
class infl (ref="Low") type cont / param=ref; 
model sat (order=data) = infl type cont / link=logit 
 aggregate=(infl type cont) scale=none unequalslopes;
run;

The order=data option tells SAS to arrange the response categories from lowest to highest in the order that they arise in the dataset. There is also a descending option that tells SAS to reverse the ordering of the categories, so that the last one becomes the lowest.

This model can be fit in R using the vglm function again, but we specify family=cumulative(parallel=FALSE) to treat the response as ordinal and use the cumulative logit. This is also where we can specify the proportional odds (or parallel slopes) assumption.

fit.add.o = vglm(cbind(Low,Medium,High) ~ Infl+Type+Cont, family=cumulative(parallel=FALSE), data=housing)
g2.add.o = deviance(fit.add.o)
df.add.o = df.residual(fit.add.o)
1 - pchisq(g2.add.o, df.add.o)

Other functions in R can be used as well, but they don't all parameterize the model the same way. The functionpolr(), for example, fits the proportional odds model but with negative coefficients (similar to SAS's "decreasing" option). So, we have to be careful!

Having just observed that the additive cumulative logit model fits the data well, let's see if we can reduce it further with the proportional odds assumption. The logit equations would be

\begin{array}{rcl} L_1 &=& \log \dfrac{P(Y \leq 1)}{P(Y > 1)}=\alpha_1+\beta_1I_1+\beta_2I_2+\beta_3T_1+\beta_4T_2+\beta_5T_3+\beta_6C\\ L_2 &=& \log \dfrac{P(Y \leq 2)}{P(Y > 2)}=\alpha_2+\beta_1I_1+\beta_2I_2+\beta_3T_1+\beta_4T_2+\beta_5T_3+\beta_6C \\ \end{array}

where \(I_1\) and \(I_2\) are the indicators for medium and high influence, \(T_1\), \(T_2\), and \(T_3\) are the indicators for apartment, atrium, and terrace types, and \(C\) is the indicator for high contact with residents.

Each \(\beta\) coefficient represents the increase in log odds of satisfaction \(\le j\) when going from the baseline to the group corresponding to that coefficient's indicator, given other groups are fixed. And if we exponentiate this, we have the odds ratio. So, for example, \(\exp(\beta_1)\) represents the odds ratio for satisfaction \(\le j\) when comparing those with medium perceived influence against those with low perceived influence on management.

The defining property of the proportional odds assumption is that these coefficients don't depend on \(j\). But that's not to say that the probability or odds of all response categories are assumed to be equal. The intercepts \(\alpha_j\) take care of that.

One tricky part of interpreting these cumulative logit model parameters lies in the fact that the cumulative probability applies to response \(j\) or less. If a \(\beta\) coefficient is positive, it means that increasing its predictor (or going 0 to 1 if the predictor is an indicator) is associated with an increase in the probability of \(j\) or less, which means that the response is more likely to be small if the predictor is large (the odds of the response being small is higher). This seems counterintuitive at first but is inherent to the way the cumulative probability is defined.

When we fit the proportional odds model in SAS, we first see this output section:


Score Test for the Proportional Odds Assumption
Chi-Square	DF	Pr > ChiSq
8.2188	6	0.2225

This reports a test of the proportional odds assumption, versus the same model without proportional odds. The proportional odds assumption means each of the 6 predictor indicators has only one \(\beta\) coefficient for both cumulative logit expressions, giving a difference of 6 parameters for this test. The chi-square test statistic of 8.2188 is not significant evidence to reject the proportional odds assumption, and so we retain it for simplicity's sake.

At this point, we can look into the significance of the predictors to see if any can be removed; the "Type 3" output table is particularly convenient for this.


Type 3 Analysis of Effects
Effect	DF	Wald Chi-Square	Pr > ChiSq
Infl	2	104.4617	<.0001
Type	3	54.6096	<.0001
Cont	1	14.2770	0.0002

Since all are significant, we will not be able to reduce the model further. We'll next look at the individual coefficient estimates.


Analysis of Maximum Likelihood Estimates
Parameter		DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept	Low	1	-0.4961	0.1245	15.8683	<.0001
Intercept	Medium	1	0.6907	0.1252	30.4317	<.0001
Infl	High	1	-1.2888	0.1267	103.4616	<.0001
Infl	Medium	1	-0.5664	0.1050	29.1180	<.0001
Type	Apartmen	1	0.5723	0.1187	23.2307	<.0001
Type	Atrium	1	0.3662	0.1568	5.4558	0.0195
Type	Terrace	1	1.0910	0.1515	51.8504	<.0001
Cont	High	1	-0.3603	0.0954	14.2770	0.0002

The first intercept value is the estimated log-odds of low satisfaction (versus medium or high) when all predictors are zero (at baseline levels). So, for individuals with low perceived influence, housing type tower, and low contact with other residents, the odds of low satisfaction is estimated to be \(\exp(-0.4961)=0.6089\). Likewise, the estimated log odds of medium or less satisfaction (versus high) is \(\exp(0.69071)=1.995\) for this same (baseline) group of individuals.

For the \(\beta\) coefficients, we can say that for those with high perceived influence, the odds of low satisfaction (versus medium or high) is \(\exp(-1.2888)=0.2756\), compared with those with low perceived influence. Moreover, by the proportional odds assumption, this is also the estimated odds ratio for low or medium satisfaction (versus high), when comparing those two influence groups. In other words, having a higher perceived influence on management is associated with higher satisfaction because it has a lower odds of being in a small satisfaction category.

As with the baseline-logit model, we can add 95% confidence limits to the odds ratio estimates for further inference. None of these include the value 1, which indicates that these predictors are all related to the satisfaction of the individuals.


Odds Ratio Estimates
Effect	Point Estimate	95% Wald Confidence Limits
Infl High vs Low	0.276	0.215	0.353
Infl Medium vs Low	0.568	0.462	0.697
Type Apartmen vs Tower	1.772	1.404	2.237
Type Atrium vs Tower	1.442	1.061	1.961
Type Terrace vs Tower	2.977	2.212	4.007
Cont High vs Low	0.697	0.579	0.841

To fit the cumulative logit model in R with the proportional odds assumption, we specify parallel=TRUE.

# ordinal, proportional odds
fit.add.op = vglm(cbind(Low,Medium,High) ~ Infl+Type+Cont, 
 family=cumulative(parallel=TRUE), data=housing)

# testing proportional odds assumption
g2.prop = 2*(logLik(fit.add.o) - logLik(fit.add.op))
df.prop = df.residual(fit.add.op) - df.residual(fit.add.o)
1 - pchisq(g2.prop, df.prop)

This code below the model fit calculates a test of the proportional odds assumption, versus the same model without proportional odds, which was fit earlier. The proportional odds assumption means each of the 6 predictor indicators has only one \(\beta\) coefficient for both cumulative logit expressions, giving a difference of 6 parameters for this test. The LRT statistic of 8.57 is not significant evidence to reject the proportional odds assumption, and so we retain it for simplicity's sake.

At this point, we can look into the significance of the predictors to see if any can be removed; the "anova" output table is particularly convenient for this.

> anova(fit.add.op, type=3, test="LR")
Analysis of Deviance Table (Type III tests: each term added last)

Model: 'cumulative', 'VGAMordinal', 'VGAMcategorical'
Links: 'logitlink'
Response: cbind(Low, Medium, High)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
Infl  2  108.239        42    155.967 < 2.2e-16 ***
Type  3   55.910        43    103.638 4.391e-12 ***
Cont  1   14.306        41     62.034 0.0001554 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Since all are significant, we will not be able to reduce the model further. We'll next look at the individual coefficient estimates.

> summary(fit.add.op)

Call:
vglm(formula = cbind(Low, Medium, High) ~ Infl + Type + Cont, 
    family = cumulative(parallel = TRUE), data = housing)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1 -0.49614    0.12454  -3.984 6.78e-05 ***
(Intercept):2  0.69071    0.12521   5.516 3.46e-08 ***
InflMedium    -0.56639    0.10496  -5.396 6.81e-08 ***
InflHigh      -1.28882    0.12670 -10.172  < 2e-16 ***
TypeApartment  0.57235    0.11875   4.820 1.44e-06 ***
TypeAtrium     0.36619    0.15677   2.336 0.019498 *  
TypeTerrace    1.09101    0.15151   7.201 5.99e-13 ***
ContHigh      -0.36028    0.09536  -3.778 0.000158 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2])

Residual deviance: 47.7276 on 40 degrees of freedom

Log-likelihood: -123.432 on 40 degrees of freedom

For the \(\beta\) coefficients, we can say that for those with high perceived influence, the odds of low satisfaction (versus medium or high) is \(\exp(-1.2888)=0.2756\), compared with those with low perceived influence. Moreover, by the proportional odds assumption, this is also the estimated odds ratio for low or medium satisfaction (versus high), when comparing those same two influence groups. In other words, having a higher perceived influence on management is associated with higher satisfaction because it has a lower odds of being in a small satisfaction category.

> exp(coef(fit.add.op))
(Intercept):1 (Intercept):2    InflMedium      InflHigh 
    0.6088794     1.9951283     0.5675685     0.2755961 
TypeApartment    TypeAtrium   TypeTerrace      ContHigh 
    1.7724273     1.4422234     2.9772934     0.6974781 
> exp(confint(fit.add.op))
                  2.5 %    97.5 %
(Intercept):1 0.4770040 0.7772139
(Intercept):2 1.5609539 2.5500670
InflMedium    0.4620337 0.6972090
InflHigh      0.2149917 0.3532843
TypeApartment 1.4043990 2.2368989
TypeAtrium    1.0607016 1.9609740
TypeTerrace   2.2123455 4.0067322
ContHigh      0.5785784 0.8408122