8.4 - The Proportional-Odds Cumulative Logit Model

8.4 - The Proportional-Odds Cumulative Logit Model

Cumulative-logit Models for Ordinal Responses

Proportional-odds cumulative logit model is possibly the most popular model for ordinal data. This model uses cumulative probabilities up to a threshold, thereby making the whole range of ordinal categories binary at that threshold. Let the response be \(Y=1,2,\ldots, J\) where the ordering is natural. The associated probabilities are \((\pi_1,\pi_2,\ldots,\pi_J)\), and a cumulative probability of a response less than equal to \(j\) is

\(P(Y \leq j)=\pi_1+\ldots+\pi_j\)

Then, a cumulative logit is defined as

\(\log\left(\dfrac{P(Y \leq j)}{P(Y > j)}\right)=\log\left(\dfrac{P(Y \leq j)}{1-P(Y \leq j)}\right)=\log\left(\dfrac{\pi_1+\ldots+\pi_j}{\pi_{j+1}+\ldots+\pi_J}\right) \)

This is the log-odds of the event that \(Y\le j\) and measures how likely the response is to be in category \(j\) or below versus in a category higher than \(j\).

The sequence of cumulative logits may be defined as:

L_1 &=& \log \left(\dfrac{\pi_1}{\pi_2+\pi_3+\cdots+\pi_J}\right)\\
L_2 &=& \log \left(\dfrac{\pi_1+\pi_2}{\pi_3+\pi_4+\cdots+\pi_J}\right)\\
& \vdots & \\
L_{J-1} &=& \log \left(\dfrac{\pi_1+\pi_2+\cdots+\pi_{r-1}}{\pi_J}\right)

And with predictors incorporated into the model, we have

L_1 &=& \beta_{01}+\beta_{11}x_1+\cdots+\beta_{p1}x_p\\
L_2 &=& \beta_{02}+\beta_{12}x_1+\cdots+\beta_{p2}x_p\\
& \vdots & \\
L_{J-1} &=& \beta_{0,J-1}+\beta_{1,J-1}x_1+\cdots+\beta_{p,J-1}x_p\\

Notice that (unlike the adjacent-category logit model) this is not a linear reparameterization of the baseline-category model. The cumulative logits are not simple differences between the baseline-category logits. Therefore, the above model will not give a fit equivalent to that of the baseline-category model.

Now suppose that we simplify the model by requiring the coefficient of each \(x\)-variable to be identical across the \(J -1\) logit equations. Then, changing the names of the intercepts to \(\alpha\)'s, the model becomes

L_1 &=& \alpha_1+\beta_1x_1+\cdots+\beta_p X_p\\
L_2 &=& \alpha_2+\beta_1x_1+\cdots+\beta_p X_p\\
& \vdots & \\
L_{J-1} &=& \alpha_{J-1}+\beta_1x_1+\cdots+\beta_p X_p

This model, called the proportional-odds cumulative logit model, has \((J - 1)\) intercepts plus \(p\) slopes, for a total of \(J + p - 1\) parameters to be estimated.

Notice that intercepts can differ, but that slope for each variable stays the same across different equations! One may think of this as a set of parallel lines (or hyperplanes) with different intercepts. The proportional-odds condition forces the lines corresponding to each cumulative logit to be parallel.


  • In this model, intercept \(\alpha_j\) is the log-odds of falling into or below category \(j\) when \(x_1 = x_2 = \dots = 0\).
  • A single parameter \(\beta_k\) describes the effect of \(x_k\) on \(Y\) such that \(\beta_k\) is the increase in log-odds of falling into or below any category associated with a one-unit increase in \(x_k\), holding all the other predictors constant; compare this to the baseline logit model where there are \(J-1\) parameters for a single explanatory variable. Therefore, a positive slope indicates a tendency for the response level to decrease as the variable decreases.
  • Constant sloped \(\beta_k\): The effect of \(x_k\), is the same for all \(J-1\) ways to collapse \(Y\) into dichotomous outcomes.

For simplicity, let's consider only one predictor: \(\text{logit}[P(Y \leq j)]=\alpha_j+\beta x\)

Then the cumulative probabilities are given by: \(P(Y \leq j)=\exp(\alpha_j+\beta x)/(1+\exp(\alpha_j+\beta x))\), and since \(\beta\) is constant, the curves of cumulative probabilities plotted against \(x\) are parallel.

  • The odds-ratio is proportional to the difference between \(x_1\) and \(x_2\) where \(\beta\) is the constant of proportionality: \(\exp[\beta(x_1-x_2)]\) and thus the name "proportional odds model".

 Question: Do you see how we get the above measure of odds-ratio?

Continuous Latent Response

One reason for the proportional-odds cumulative-logit model's popularity is its connection to the idea of a continuous latent response. Suppose that the categorical outcome is actually a categorized version of an unobservable (latent) continuous variable.

1 r-1 ... r 2

For example, it is reasonable to think that a 5-point Likert scale (1 = strongly disagree, 2 = agree, 3 = neutral, 4 = agree, 5 = strongly agree) is a coarsened version of a continuous variable Z indicating degree of approval. The continuous scale is divided into five regions by four cut-points \(c_1, c_2, c_3, c_4\) which are determined by nature (not by the investigator). If \(Z \le c_1\), we observe \(Y = 1\); if \(c1< Z \le c_2\), we observe \(Y = 2\); and so on. Here is the connection. Suppose that the \(Z\) is related to the \(x\)'s through a homoscedastic linear regression. For example, with a single \(x\), the relationship looks like this:

Z X Y=1 Y=2 Y=3 Y=4 Y=5 c4 c3 c2 c1

If the regression of \(Z\) on the \(x\)'s has the form

\(Z=\gamma_0+\gamma_1 x_1+\gamma_2 x_2+\cdots+\gamma_p x_p+\epsilon \),

where \(\epsilon\) is a random error from a logistic distribution with mean zero and constant variance, then the coarsened version \(Y\) will be related to the \(x\)'s by a proportional-odds cumulative logit model. (The logistic distribution has a bell-shaped density similar to a normal curve. If we were to have normal errors rather than logistic errors, the cumulative logit equations would change to have a probit link. In most cases, the fit of a logit and probit model are quite similar.)

If the regression of \(Z\) on the \(x\)'s is heteroscedastic—for example, if the variance increases with the mean—then the logit equations will "fan out" and not have constant slope. A model with non-constant slopes is somewhat undesirable because the lines are not parallel; the logit lines will eventually cross each other, implying negative probabilities for some categories.

CASE STUDY: The Penn State Ice Cream Study [Optional]


If time permits, you should also read and listen to the Penn State Ice Cream Case Study where Dr. Bill Harkness unravels the "mystery" of the polytomous logistic regression (through SAS, although we do provide the R code too). While doing this, please try to note:

  1.  Why does Dr. Harkness say the ordinary chi-square test is not sufficient for this type of data?
  2. Why is this particular example quadratic in nature?
  3. Can we use the ordinary regression model for this data? And at which point would it be OK to approximate a categorical response variable as continuous; e.g. with how many levels?
  4. Does the proportional odds model for the ice cream data fit well?

Has Tooltip/Popover
 Toggleable Visibility