# 5.3 - Marginal and Conditional Odds Ratios

### Marginal Odds Ratios

Marginal odds ratios are odds ratios between two variables in the marginal table, and can be used to test for marginal independence between two variables while ignoring the third. For example, for *AC* margin, *μ _{i + k}*, where

*μ*denotes expected counts

*,*the "marginal odds ratio" is:

\(\theta_{AC}=\dfrac{\mu_{1+1}\mu_{2+2}}{\mu_{1+2}\mu_{2+1}}\)

or, * sample (observed) marginal odds-ratio for our running death example is:*

\(\hat{\theta}_{AC}=\dfrac{19\times 149}{141\times 17}=1.18\)

The odds of death penalty for a white defendant are 1.18 times as high as they are for a black defendant. But is this value statistically significant?

### Conditional Odds Ratios

Conditional odds ratios are odds ratios between two variables for fixed levels of the third variable, and can test for conditional independence of two variables given the third. For example, for the fixed levels of *B*, estimated conditional *AC* association **given** the *j*^{th} level of *B* is

\(\theta_{AC(j)}=\dfrac {\mu_{1j1}\mu_{2j2}}{\mu_{1j2}\mu_{2j1}}\)

These are computed using the partial tables, and are sometimes refer to as "partial associations". For example, * sample (observed) conditional (partial) odds-ratios * in our running death example are:

\(\hat{\theta}_{AC(B=white)}=\dfrac{19\times 52}{11\times 132}=0.68\)

\(\hat{\theta}_{AC(B=black)}=\dfrac{0\times 97}{9\times 6}=0\)

Recall, that when we have sampling zeros (i.e., zero counts), an ad hoc method to get an estimate of odds-ratios other than the zero is to add 0.5 to each cell value. In this example, the estimated odds-ratio of A and C for B=black would be: (0.5 × 97.5)/(9.5 × 6.5)= 0.79.

How would you interpret these odds ratios? |

We are calculating the odds ratios for the various partial tables of the larger table, and can use them to test the conditional independence of A and C given B. If θ_{AC(j)} ≠ 1 for at least one level of B (at least one *j) *we can say that variables *A* and *C* are *conditionally associated*. We will learn more about this, but for now let's utilize our knowledge of two-way tables to do some preliminary analysis.

**Note:** Marginal association can be very different from conditional association! That is, marginal and conditional odds ratios do NOT need to be equal. In fact, sometimes they may lead to quite the opposite conclusions!

Apply your knowledge of two-way tables. Compare the marginal and conditional odds ratios for our example. Do they lead to the same or different inference? How about tests of independence for each 2 × 2 table? How about different measures of associations for these sub-tables? What do they tell us about the relationships between these variables? You can use the SAS or R code/outputs below. |

Let's look at using the SAS program file death.sas (output: death.lst).

You should run the INSPECT link to see different meanings of options in the above SAS code, and the SAS output. For example,

tables defendant*penalty/chisq all nocol nopct;

will create a marginal table of *defendant *and *penalty,* and compute all the relevant statistics for this 2x2 table (see below). To get the partial tables of *defendant* and *penalty* for each level of *victim*, and get all the analysis for these 2x2 sub-tables, you can run the following line:

tables victim*defendant*penalty /chisq cmh nocol nopct;

We will discuss the CMH option later. In PROC FREQ, the partial tables will be created given the levels of the first variable you specify when creating a three-way table. You can also run this program in SAS to get the output or look at the death.lst.

**Statistical Inference **

Marginal Independence. Let us first look at the marginal table of Defendant’s Race and Death Penalty, while ignoring the Victim’s race (see below). The point estimate of the odds-ratio is 1.18 and its 95% CI is (0.5902, 2.3634), based on the *Case-Control (Odds Ratio)* row below. The odds of death penalty are 1.18 times as high for white defendants as they are for black defendants. Recall that a null hypothesis that odds-ratio = 1 means that the variables are independent. Based on these data, we cannot reject the null hypothesis that defendant’s race is independent of the death penalty. Furthermore, we can be 95% confident that the defendant's race and death penalty are independent, since the true odds ratio is between 0.6 and 2.4, and the interval contains 1.00. However, keep in mind that we ignored the Victim's race here. The more precise statement would be to say that Defendant's race and Death penalty appear to be **marginally independent**.

Conditional Independence. Now consider the point estimates of odds ratios when we control for the Victim’s Race, i.e., conditional odds-ratios (see Sec. 5.2. or parts of the SAS output and try to identify the partial tables and their relevant statistics.) Given the victim is white, the odds of death penalty are 0.69 times as high for white defendants as they are for black, but its 95% CI (0.32, 1.50)) indicates that there is no significant difference; see the row labeled *Case-Control (Odds-Ratio)*. Chi-square statistics test for independence of defendant's race and death penalty for when the victims are white confirm the same finding, e.g., with *X*^{2}=0.88, df=1, p-value=0.35 we fail to reject the null hypothesis.

Given the victim is black, the odds ratio is 0.79 once we adjust the sampling zero by adding 0.5 to each count and then computing it as OR = (0.5 × 97.5)/(9.5 × 6.5) = 0.79. If we consider the confidence intervals for these odds-ratios or for each of these 2 × 2 sub-tables perform the test of independence, the null hypothesis of independence cannot be rejected. More specifically, we say that Defendant's Race and Death Penalty are ** conditionally independent given victim's race**.

However, based on the point estimates, marginal and conditional associations seem to show opposite (effects) directions; e.g., 1.18 vs 0.69 and 0.79, even though the results are not significant. This is an example of the phenomena known as ** Simpson’s paradox** discussed below.

For R users should open the death.R file and its corresponding output file death.out.

There are many ways to do this. For example, after we entered the data into an 3-dim array (see death.R), R will display the partial tables by the levels of the last variable in the array (e.g., see *deathp* in the code or the output).

> deathp <- c(19,132, 11,52,0,9, 6,97)

> deathp

[1] 19 132 11 52 0 9 6 97

> #### we can represent this table also in 3 dimensions

> deathp <- array(deathp, dim=c(2,2,2))

> dimnames(deathp) <- list(DeathPen=c("yes","no"),

+ Defendant=c("white","black"),

+ Victim=c("white","black"))

> deathp

, , Victim = white

Defendant

DeathPen white black

yes 19 11

no 132 52

, , Victim = black

Defendant

DeathPen white black

yes 0 6

no 9 97

We can also use ftable() function, e.g.,

ftable(deathp, row.vars=c("Defendant","Victim"),col.vars="DeathPen")

to create flat tables, in this case random variables AB by C, so we are looking at 4 × 2 table representation of the original 2 × 2 × 2 table.

DeathPen yes no

Defendant Victim

white white 19 132

black 0 9

black white 11 52

black 6 97

To create a marginal table, you can use a function margin.table(), e.g.,

margin.table(deathp, c(2,1))

DeathPen

Defendant yes no

white 19 141

black 17 149

This function creates a marginal table of the second and the first variable from the original array, in this case, Defendant × Death Penalty. For more details run the code and explore more ways to manipulate the data, and post any questions you may have on the discussion board.

**Statistical Inference **

Marginal Independence. Let us first look at the marginal table of Defendant’s Race and Death Penalty, while ignoring the Victim’s race (see below). The point estimate of the odds-ratio is 1.181 and its 95% CI is (0.595, 2.343), using the VCD pagckage. The odds of death penalty are 1.18 times as high for white defendants as they are for black defendants. Recall that a null hypothesis that odds-ratio = 1 means that the variables are independent. Based on these data, we cannot reject the null hypothesis that defendant’s race is independent of the death penalty. Furthermore, we can be 95% confident that the defendant's race and death penalty are independent, since the true odds ratio is between 0.6 and 2.3, and the interval contains 1.00. However, keep in mind that we ignored the Victim's race here. The more precise statement would be to say that Defendant's race and Death penalty appear to be **marginally independent**.

> AC<-margin.table(deathp, c(2,1))

> chisq.test(AC)

Pearson's Chi-squared test with Yates' continuity correction

data: AC

X-squared = 0.0863, df = 1, p-value = 0.7689

> assocstats(AC)

X^2 df P(> X^2)

Likelihood Ratio 0.22145 1 0.63794

Pearson 0.22145 1 0.63794

> oddsratio(AC, log=FALSE)

[1] 1.18106

> exp(confint(oddsratio(AC)))

lwr upr

[1,] 0.5953049 2.343172

Conditional Independence. Now consider the point estimates of odds ratios when we control for the Victim’s Race, i.e., conditional odds-ratios (see Sec. 5.2. or parts of the R output and try to identify the partial tables and their relevant statistics.) Given the victim is white, the odds of death penalty are 0.69 times as high for white defendants as they are for black, but its 95% CI (0.31, 1.50)) indicates that there is no significant difference; see output below and note that there are many different ways of doing this in R -- a few are provided in the death.R code Chi-square statistics test for independence of defendant's race and death penalty for when the victims are white confirm the same finding, e.g., with *X*^{2}=0.88, df=1, p-value=0.35 we fail to reject the null hypothesis. Notice that if we square the z-value from the summary(lor) below, $(-0.948)^2\approx 0.88$ with two sided p-value $1-pchisq(0.88,1)=0.35$.

> ### via odds-ratios

> oddsratio(deathp, 3, log=FALSE)

white black

0.6804408 0.7894737

> ##log odds ratio for a 2x2 table given the levels of the 3rd variable

> lor=oddsratio(deathp,3)

> exp(confint(lor)) ## CI

lwr upr

white 0.30704729 1.50791

black 0.04121472 15.12248

> summary(lor)

Log Odds Ratio Std. Error z value Pr(>|z|)

white -0.38501 0.40600 -0.9483 0.1715

black -0.23639 1.50644 -0.1569 0.4377

Given the victim is black, the odds ratio is 0.79 once we adjust the sampling zero by adding 0.5 to each count and then computing it as OR = (0.5 × 97.5)/(9.5 × 6.5) = 0.79. If we consider the confidence intervals for these odds-ratios or for each of these 2 × 2 sub-tables perform the test of independence, the null hypothesis of independence cannot be rejected. More specifically, we say that Defendant's Race and Death Penalty are ** conditionally independent given victim's race**.

However, based on the point estimates, marginal and conditional associations seem to show opposite (effects) directions; e.g., 1.18 vs 0.69 and 0.79, even though the results are not significant. This is an example of the phenomena known as ** Simpson’s paradox** discussed below.

### Simpson’s paradox

Simpson's paradox is the phenomenon that a pair of variables can have marginal association and partial (conditional) associations in opposite direction. Another way to think about this is that the nature and direction of association changes due to presence or absence of a third (possibly confounding) variable.

In the simplest example, consider three binary variables, *A*, *B*, *C*. In the marginal table where we are ignoring the presence of C, let

P(B= 1|A= 1) <P(B= 1|A= 2)

In the partial table, after we account for the presence of variable C, let

P(B= 1|A= 1,C= 1) >P(B= 1|A= 2,C= 1) and

P(B= 1|A= 1,C= 2) >P(B= 1|A= 2,C= 2)

In terms of odds ratios, marginal odds θ_{AB} < 1, and partial odds θ_{AB(C=1)} > 1 and θ_{AB(C=2)} > 1.

In the Death Penalty example, we had marginal odds greater than one, and partial odds ratios less than one.

Here is Dr. Morton with a quick video explanation of what this paradox involves.

In addition, for those of you that would like to delve a little deeper into this, here is a link to "Algebraic geometry of 2 × 2 contingency tables" by Slavkovic and Fienberg. On page 17 of this document is a diagram of this paradox as well.

These associations can also be captured in terms of models. Next, we explore more on different independence and association concepts that capture relationships between three categorical variables.