3.6 - Odds Ratio | STAT 504

This is perhaps the most commonly used measure of association. Later on, we will see this is a natural parameter for many of the log-linear and logistic models.

Odds: The odds are ratios of probabilities of "success" and "failure" for a given row, or a ratio of conditional probabilities of the same conditional distribution.

Odds of getting a cold versus not getting a cold given that a person took a placebo:

\(odds_1=\dfrac{P(Z=1|Y=1)}{P(Z=2|Y=1)}=\dfrac{\pi_{1|1}}{\pi_{2|1}}=\dfrac{\pi_{1|1}}{1-\pi_{1|1}}\)

The second odds (given that ascorbic acid was taken),

\(odds_2=\dfrac{P(Z=1|Y=2)}{P(Z=2|Y=2)}=\dfrac{\pi_{1|2}}{\pi_{2|2}}=\dfrac{\pi_{1|2}}{1-\pi_{1|2}}\)

Properties of odds

If odds equal to 1, "success" and "failure" are equally likely.
If odds > 1, then "success" is more likely than "failure".
If odds < 1, then "success" is less likely than "failure".

Odds Ratio

The odds ratio, is the ratio of odds₁ and odds₂ (or vice versa):

\begin{align}
\theta &= \dfrac{P(Z=1|Y=1)/P(Z=2|Y=1)}{P(Z=1|Y=2)/P(Z=2|Y=2)}\\
&= \dfrac{\left(\dfrac{\pi_{11}}{\pi_{1+}}\right)/\left(\dfrac{\pi_{12}}{\pi_{1+}}\right)}{\left(\dfrac{\pi_{21}}{\pi_{2+}}\right)/\left(\dfrac{\pi_{22}}{\pi_{2+}}\right)}\\
&= \dfrac{\pi_{11}\pi_{22}}{\pi_{12}\pi_{21}}\\
\end{align}

Clearly, \(\theta\) is a function of the parameters of \(P(Z | Y )\), so inferences about it should be the same under Poisson, multinomial, or product-multinomial (\(n_{i+}\)s fixed) sampling. But if we interchange the roles of \(Y\) and \(Z\), we still get

\(\theta=\dfrac{\pi_{11}\pi_{22}}{\pi_{12}\pi_{21}}\)

so \(\theta\) can also be regarded as a function of the parameters of \(P(Y| Z )\). Therefore, the likelihood inferences will be the same if we regard the \(n_{+j}\)s as fixed.

Point estimate, CI and hypothesis test Section

The natural estimate of \(\theta\) is the sample cross-product ratio,

\(\hat{\theta}=\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\)

The properties of \(\hat{\theta}\) are easily established under multinomial sampling, but the same properties will hold under Poisson or product-multinomial sampling with either the row totals or column totals (but not both) regarded as fixed.

As with the relative risk, the log-odds ratio \(\log\hat{\theta}\) has a better normal approximation than \(\hat{\theta}\) does. Therefore, we usually obtain a confidence interval on the log scale; please note again that log throughout this course is a natural log, that is log base \(e\). The estimated variance of \(\log\hat{\theta}\) is easy to remember,

\(\hat{V}(\log\hat{\theta})=\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}\)

and we get a 95% confidence interval for \(\theta\) by exponentiating the endpoints of

\(\log\hat{\theta} \pm 1.96\sqrt{\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}}\)

Note! Testing \(H_0 \colon \theta= 1\) is equivalent to testing independence in \(2 \times 2\) tables.

For the Vitamin C example, the odds of "success" (i.e., getting a cold), given that a skier took vitamin C, are \(0.12/0.88 = 0.14\). The odds of "success" (i.e., getting a cold), given that a skier took a placebo pill, are \(0.22/0.78 = 0.28\).

The odds ratio is \(0.14/0.28 = 0.49\), and the 95% CI for \(\log\theta\) would be

\(\log(0.490)\pm 1.96 \sqrt{1/17+1/109+1/122+1/31}=\)
\((-1.359,-0.068)\)

Finally, exponentiating limits gives us the 95% CI for \(\theta\): (0.256, 0.934). Notice, that we could have also computed \(0.28/0.14=2.04=31(122)/(109(17))\), which is the inverse of the above value we computed: \(1/0.49=2.04\). For our example, \(\hat{\theta}=0.49\) means that

the odds of getting a cold given vitamin C are .49 times the odds of getting cold given a placebo
the odds of getting a cold given a placebo are \(1/.49 = 2.04\) times greater than the odds of given vitamin C
getting cold is less likely given vitamin C than given a placebo.

For computation in SAS, for the Vitamin C example compare the above calculations to relevant SAS output under heading "Statistics for Table of treatment and response: Odds Ratio";


Odds Ratio
Odds Ratio	0.4900

Asymptotic Conf Limits
95% Lower Conf Limit	0.2569
95% Upper Conf Limit	0.9343

Exact Conf Limits
95% Lower Conf Limit	0.2407
95% Upper Conf Limit	0.9740

The best way to get all association measures is to use option MEASURES in PROC FREQ. Working with the VitaminC.sas file, we can replace the following line

tables treatment*response/ chisq relrisk riskdiff expected;

with

tables treatment*response/ chisq measures expected;

The computation in R is available with the VitaminC.R file.

For more on the interpretation of odds and odds-ratios and their properties see below.

Properties of Odds Ratios Section

If \(\theta = 3\), the odds of "success" in row 1 are 3 times greater than the odds of success in row 2; individuals in row 1 are more likely to have a "success" than those in row 2. If \(\theta = 0.3\), the odds of "success" in row 1 are 0.3 times the odds of the row 2; the odds of "success" in row 2 are \((1/0.3) = 3.33\) times the odds in row 1.

The relationship between odds and probabilities can be expressed as

\(odds_1=\dfrac{\pi_{1|1}}{1-\pi_{1|1}}\iff\pi_{1|1}=\dfrac{odds_1}{1+odds_1}\)

If the variables are independent, then \(\pi_{1|1} = \pi_{1|2}\), \(odds_1 = odds_2\), and

\(\theta=\frac{odds_1}{odds_2}=1\)

If the variables are not independent such that \(\pi_{1|1} > \pi_{1|2}\), then \(odds_1 > odds_2\), and

\( 1<\theta\)

If the variables are not independent such that \(\pi_{1|1} < \pi_{1|2}\), then \(odds_1< odds_2\), and

\( 0 < \theta < 1\)

If both \(\pi_{1|1}\) and \(\pi_{1|2}\) are small in the population, then the odds ratio and relative risk will be close since \(\frac{1-\pi_{1|1}}{1-\pi_{1|2}}\) will be close to 1. The odds ratio \(\theta\) does NOT depend on the marginal distribution of either variable. If the categories of both variables are interchanged, the value of \(\theta\) does not change. If the categories of one variable are switched, the odds ratio in the new re-arranged table will equal \(1/\theta\).

Finally, note that the sample odds ratio will equal zero or \(\infty\) if any \(n_{ij}=0\). Some authors suggest adding \(1/2\) to each cell count and then recalculating the sample odds ratio and its standard error to avoid this issue.