4.1 - Cumulative Odds and Odds Ratios

Recall that a discrete ordinal variable, say \(Y\), takes on values that can be sorted either least to greatest or greatest to least. Examples introduced earlier might be the face of a die (\(1,2,\ldots,6\)) or a person's attitude towards war ("strongly disagree", "disagree", "agree", "strongly agree"). For convenience, we may label such categories as 1 through \(J\), which allows us to express cumulative probabilities as

\(F(j) = P(Y\le j)=\pi_1+\cdots+\pi_j\),

where the parameter \(\pi_j\) represents the probability of category \(j\). So, we can write \(F(2)=\pi_1+\pi_2\) to represent the probability that the die is either 1 or 2, or, in the second example, the probability that an individual responds either "strongly disagree" or "disagree". Note that we need only the ordinal property for this to make sense; the values \(1,2,\ldots,J\) themselves do not represent any numerically meaningful quantities in these cases.

Cumulative Odds Section

In addition to probabilities (or risks), ordinal categories also allow an odds to be defined for cumulative events. Recall the observed data for the war attitude example:

Attitude Count
Strongly disagree 35
Disagree 27
Agree 23
Strongly agree 31
Total 116

If we focus on the "disagree" outcome in particular, the estimated probability would be \(\hat{\pi}_2=27/116=0.2328\) with corresponding estimated odds \(\hat{\pi}_2/(1-\hat{\pi}_2)=27/(35+23+31)=0.3034\). However, using the estimated cumulative probability \(\hat{F}(2)=\hat{\pi}_1+\hat{\pi}_2=(35+27)/116=0.5345\), we may also consider the estimated cumulative odds:

\( \dfrac{\hat{F}(2)}{1-\hat{F}(2)}=\dfrac{35+27}{23+31}=1.1481 \)

We interpret this value as the (estimated) cumulative odds that an individual will "disagree", where the category of "strongly disagree" is implicitly included. Equivalently, we may also refer to this as the (estimated) odds of "strongly disagree" or "disagree". In general, we define the cumulative odds for \(Y\le j\) as

\(\dfrac{F(j)}{(1-F(j))}=\dfrac{\pi_1+\cdots+\pi_j}{\pi_{j+1}+\cdots+\pi_J},\quad\mbox{for }j=1,\ldots,J-1\)

with sample estimate \(\hat{F}(j)/(1-\hat{F}(j))\). The case \(j=J\) is not defined because \(1-F(J)=0\). Like cumulative probabilities, cumulative odds are necessarily non-decreasing and will be strictly increasing if the observed counts are all positive.

Cumulative Odds Ratios Section

If an additional variable is involved, this idea extends to cumulative odds ratios. Consider the table below summarizing the responses for extent of agreement to the statement "job security is good" (jobsecok) and general happiness (happy) from the 2018 General Social Surveys. Additional possible responses of "don't know" and "no answer" are omitted here.

  Not too happy Pretty happy Very happy
Not at all true 15 25 5
Not too true 21 47 21
Somewhat true 64 248 100
Very true 73 474 311

If we condition on jobsecok and view happy as the response variable, the cumulative odds ratio for "not too happy" or "pretty happy" for those who say "not at all true",  relative to those who say "very true", would be estimated with

\(\dfrac{(15+25)/5}{(73+474)/311}=4.55 \)

If perhaps "pretty happy" and "very happy" seem like a more intuitive combination to consider, keep in mind that we're free to reverse the order to start with "very happy" and end with "not too happy" without violating the ordinal nature of this variable. By indexing "very happy" with \(j=1\), \(F(2)\) becomes the cumulative probability of "very happy" or "pretty happy", and the cumulative odds would likewise be \(F(2)/(1-F(2))\). Likewise, we could choose any two rows to serve as the groups for comparison. The row variable doesn't even have to be ordinal itself to define such a cumulative odds ratio.

However, if we happen to have two ordinal variables, as we do in this example, we can work with a cumulative version in both dimensions. We may, for example, consider the cumulative odds of "not too happy" or "pretty happy", for those who say "not at all" or "not too" true, relative to those who say "somewhat" or "very" true. The estimate of this would be

\(\dfrac{(15+25+21+47)/(5+21)}{(64+248+73+474)/(100+311)}=1.99 \)

The common theme in all these odds ratios is that they essentially convert an \(I\times J\) table into a \(2\times2\) table by combining or "accumulating" counts in adjacent categories, depending on our focus of interest. This is illustrated by the shading in the tables below.

\(2\times2\) table cumulative odds ratio for "not too" or "pretty" happy for "not at all" compared with "very true"
  Not too happy Pretty happy Very happy
Not at all true 15 25 5
Not too true 21 47 21
Somewhat true 64 248 100
Very true 73 474 311
 
  Not too or pretty happy Very happy
Not at all true 40 5
Very true 547 311
  Not too happy Pretty happy Very happy
Not at all true 15 25 5
Not too true 21 47 21
Somewhat true 64 248 100
Very true 73 474 311
 
\(2\times2\) table cumulative odds ratio for "not too" or "pretty" happy for "not at all" or "not too" true compared with "somewhat" or "very" true
  Not too or pretty happy Very happy
Not at all or not too true 108 26
Somewhat or very true 859 411

CIs for Cumulative Odds Ratios Section

Recall for a \(2\times 2\) table with counts \((n_{11},n_{12},n_{21},n_{22})\), we have the sample odds ratio \(\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\) and corresponding 95% confidence interval for the (population) log odds ratio:

\(\log\dfrac{n_{11}n_{22}}{n_{12}n_{21}} \pm 1.96\sqrt{\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}}\)

We can readily adopt this formula for a cumulative odds ratio \(\theta\) as well. We just need to work with the \(2\times2\) table of counts induced by any accumulation. For example, the \(2\times2\) table induced by the cumulative odds ratio for "not too" or "pretty" happy for those saying "not at all" compared with "very" true is

  Not too or Pretty happy Very happy
Not at all true 40 5
Very true 547 311

With estimate \(\hat{\theta}=4.55\), the 95% confidence interval for \(\log\theta\) is

\(\log 4.55 \pm 1.96\sqrt{\dfrac{1}{40}+\dfrac{1}{5}+\dfrac{1}{547}+\dfrac{1}{311}} = (0.5747, 2.4549)\)

And by exponentiating the endpoints, we have the 95% CI for \(\theta\):

\(e^{(0.5747, 2.4549)}=(1.7766, 11.6448)\)

Likewise, for the cumulative odds of "not too" or "pretty" happy, for those saying "not at all" or "not too" true compared with those saying "somewhat" or "very" true, we have on the log scale

\(\log 1.99 \pm 1.96\sqrt{\dfrac{1}{108}+\dfrac{1}{26}+\dfrac{1}{859}+\dfrac{1}{411}} = (0.2429, 1.1308)\)

And, by exponentiating limits, we have the final CI for the odds ratio:

\(e^{(0.2429, 1.1308)}=(1.275, 3.098)\)

To put it a bit more loosely, we can say that individuals who generally don't agree as much with the statement "job security is good" have a greater odds of being less happy. Or, equivalently, those who generally agree more with the statement "job security is good" have a greater odds of being happier.