# 4.1 - Cumulative Odds and Odds Ratios

4.1 - Cumulative Odds and Odds RatiosRecall that a discrete ordinal variable, say \(Y\), takes on values that can be sorted either least to greatest or greatest to least. Examples introduced earlier might be the face of a die (\(1,2,\ldots,6\)) or a person's attitude towards war ("strongly disagree", "disagree", "agree", "strongly agree"). For convenience, we may label such categories as 1 through \(J\), which allows us to express cumulative probabilities as

\(F(j) = P(Y\le j)=\pi_1+\cdots+\pi_j\),

where the parameter \(\pi_j\) represents the probability of category \(j\). So, we can write \(F(2)=\pi_1+\pi_2\) to represent the probability that the die is either 1 or 2, or, in the second example, the probability that an individual responds either "strongly disagree" or "disagree". Note that we need only the ordinal property for this to make sense; the values \(1,2,\ldots,J\) themselves do not represent any numerically meaningful quantities in these cases.

## Cumulative Odds

In addition to probabilities (or risks), ordinal categories also allow an odds to be defined for cumulative events. Recall the observed data for the war attitude example:

Attitude | Count |
---|---|

Strongly disagree | 35 |

Disagree | 27 |

Agree | 23 |

Strongly agree | 31 |

Total | 116 |

If we focus on the "disagree" outcome in particular, the estimated probability would be \(\hat{\pi}_2=27/116=0.2328\) with corresponding estimated odds \(\hat{\pi}_2/(1-\hat{\pi}_2)=27/(35+23+31)=0.3034\). However, using the estimated cumulative probability \(\hat{F}(2)=\hat{\pi}_1+\hat{\pi}_2=(35+27)/116=0.5345\), we may also consider the estimated cumulative odds:

\( \dfrac{\hat{F}(2)}{1-\hat{F}(2)}=\dfrac{35+27}{23+31}=1.1481 \)

We interpret this value as the (estimated) cumulative odds that an individual will "disagree", where the category of "strongly disagree" is implicitly included. Equivalently, we may also refer to this as the (estimated) odds of "strongly disagree" or "disagree". In general, we define the **cumulative odds** for \(Y\le j\) as

\(\dfrac{F(j)}{(1-F(j))}=\dfrac{\pi_1+\cdots+\pi_j}{\pi_{j+1}+\cdots+\pi_J},\quad\mbox{for }j=1,\ldots,J-1\)

with sample estimate \(\hat{F}(j)/(1-\hat{F}(j))\). The case \(j=J\) is not defined because \(1-F(J)=0\). Like cumulative probabilities, cumulative odds are necessarily non-decreasing and will be strictly increasing if the observed counts are all positive.

## Cumulative Odds Ratios

If an additional variable is involved, this idea extends to cumulative odds ratios. Consider the table below summarizing the responses for extent of agreement to the statement "job security is good" (*jobsecok*)* *and general happiness (*happy) *from the 2018 General Social Surveys. Additional possible responses of "don't know" and "no answer" are omitted here.

Not too happy | Pretty happy | Very happy | |
---|---|---|---|

Not at all true | 15 | 25 | 5 |

Not too true | 21 | 47 | 21 |

Somewhat true | 64 | 248 | 100 |

Very true | 73 | 474 | 311 |

If we condition on *jobsecok* and view *happy *as the response variable, the cumulative odds ratio for "not too happy" or "pretty happy" for those who say "not at all true", relative to those who say "very true", would be estimated with

\(\dfrac{(15+25)/5}{(73+474)/311}=4.55 \)

If perhaps "pretty happy" and "very happy" seem like a more intuitive combination to consider, keep in mind that we're free to reverse the order to start with "very happy" and end with "not too happy" without violating the ordinal nature of this variable. By indexing "very happy" with \(j=1\), \(F(2)\) becomes the cumulative probability of "very happy" or "pretty happy", and the cumulative odds would likewise be \(F(2)/(1-F(2))\). Likewise, we could choose any two rows to serve as the groups for comparison. The row variable doesn't even have to be ordinal itself to define such a cumulative odds ratio.

However, if we happen to have two ordinal variables, as we do in this example, we can work with a cumulative version in both dimensions. We may, for example, consider the cumulative odds of "not too happy" or "pretty happy", for those who say "not at all" or "not too" true, relative to those who say "somewhat" or "very" true. The estimate of this would be

\(\dfrac{(15+25+21+47)/(5+21)}{(64+248+73+474)/(100+311)}=1.99 \)

The common theme in all these odds ratios is that they essentially convert an \(I\times J\) table into a \(2\times2\) table by combining or "accumulating" counts in adjacent categories, depending on our focus of interest. This is illustrated by the shading in the tables below.

Not too happy | Pretty happy | Very happy | |
---|---|---|---|

Not at all true | 15 | 25 | 5 |

Not too true | 21 | 47 | 21 |

Somewhat true | 64 | 248 | 100 |

Very true | 73 | 474 | 311 |

Not too or pretty happy | Very happy | |
---|---|---|

Not at all true | 40 | 5 |

Very true | 547 | 311 |

Not too happy | Pretty happy | Very happy | |
---|---|---|---|

Not at all true | 15 | 25 | 5 |

Not too true | 21 | 47 | 21 |

Somewhat true | 64 | 248 | 100 |

Very true | 73 | 474 | 311 |

Not too or pretty happy | Very happy | |
---|---|---|

Not at all or not too true | 108 | 26 |

Somewhat or very true | 859 | 411 |

## CIs for Cumulative Odds Ratios

Recall for a \(2\times 2\) table with counts \((n_{11},n_{12},n_{21},n_{22})\), we have the sample odds ratio \(\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\) and corresponding 95% confidence interval for the (population) log odds ratio:

\(\log\dfrac{n_{11}n_{22}}{n_{12}n_{21}} \pm 1.96\sqrt{\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}}\)

We can readily adopt this formula for a cumulative odds ratio \(\theta\) as well. We just need to work with the \(2\times2\) table of counts induced by any accumulation. For example, the \(2\times2\) table induced by the cumulative odds ratio for "not too" or "pretty" happy for those saying "not at all" compared with "very" true is

Not too or Pretty happy | Very happy | |
---|---|---|

Not at all true | 40 | 5 |

Very true | 547 | 311 |

With estimate \(\hat{\theta}=4.55\), the 95% confidence interval for \(\log\theta\) is

\(\log 4.55 \pm 1.96\sqrt{\dfrac{1}{40}+\dfrac{1}{5}+\dfrac{1}{547}+\dfrac{1}{311}} = (0.5747, 2.4549)\)

And by exponentiating the endpoints, we have the 95% CI for \(\theta\):

\(e^{(0.5747, 2.4549)}=(1.7766, 11.6448)\)

Likewise, for the cumulative odds of "not too" or "pretty" happy, for those saying "not at all" or "not too" true compared with those saying "somewhat" or "very" true, we have on the log scale

\(\log 1.99 \pm 1.96\sqrt{\dfrac{1}{108}+\dfrac{1}{26}+\dfrac{1}{859}+\dfrac{1}{411}} = (0.2429, 1.1308)\)

And, by exponentiating limits, we have the final CI for the odds ratio:

\(e^{(0.2429, 1.1308)}=(1.275, 3.098)\)

To put it a bit more loosely, we can say that individuals who generally don't agree as much with the statement "job security is good" have a greater odds of being less happy. Or, equivalently, those who generally agree more with the statement "job security is good" have a greater odds of being happier.