5.3 - Marginal and Conditional Odds Ratios

Printer-friendly versionPrinter-friendly version

Marginal Odds Ratios

Marginal odds ratios are odds ratios between two variables in the marginal table, and can be used to test for marginal independence between two variables while ignoring the third. For example, for AC margin, μi + k, where μ denotes expected counts, the "marginal odds ratio" is:

\(\theta_{AC}=\dfrac{\mu_{1+1}\mu_{2+2}}{\mu_{1+2}\mu_{2+1}}\)

or, sample (observed) marginal odds-ratio for our running death example is:

\(\hat{\theta}_{AC}=\dfrac{19\times 149}{141\times 17}=1.18\)

The odds of death penalty for a white defendant are 1.18 times as high as they are for a black defendant. But is this value statistically significant?

Conditional Odds Ratios

Conditional odds ratios are odds ratios between two variables for fixed levels of the third variable, and can test for conditional independence of two variables given the third. For example, for the fixed levels of B, estimated conditional AC association given the jth level of B is

\(\theta_{AC(j)}=\dfrac {\mu_{1j1}\mu_{2j2}}{\mu_{1j2}\mu_{2j1}}\)

These are computed using the partial tables, and are sometimes refer to as "partial associations". For example, sample (observed) conditional (partial) odds-ratios in our running death example are:

\(\hat{\theta}_{AC(B=white)}=\dfrac{19\times 52}{11\times 132}=0.68\)

\(\hat{\theta}_{AC(B=black)}=\dfrac{0\times 97}{9\times 6}=0\)

Recall, that when we have sampling zeros (i.e., zero counts), an ad hoc method to get an estimate of odds-ratios other than the zero is to add 0.5 to each cell value. In this example, the estimated odds-ratio of A and C for B=black would be: (0.5 × 97.5)/(9.5 × 6.5)= 0.79.

Discuss    How would you interpret these odds ratios?

We are calculating the odds ratios for the various partial tables of the larger table, and can use them to test the conditional independence of A and C given B. If θAC(j) ≠ 1 for at least one level of B (at least one j) we can say that variables A and C are conditionally associated. We will learn more about this, but for now let's utilize our knowledge of two-way tables to do some preliminary analysis.

Note: Marginal association can be very different from conditional association! That is, marginal and conditional odds ratios do NOT need to be equal. In fact, sometimes they may lead to quite the opposite conclusions!

Discuss     Apply your knowledge of two-way tables. Compare the marginal and conditional odds ratios for our example.  Do they lead to the same or different inference? How about tests of independence for each 2 × 2 table? How about different measures of associations for these sub-tables? What do they tell us about the relationships between these variables? You can use the SAS or R code/outputs below.

SAS logoLet's look at using the SAS program file death.sas (output: death.lst).

SAS program death.sas

You should run the INSPECT link to see different meanings of options in the above SAS code, and the SAS output. For example,

tables defendant*penalty/chisq all nocol nopct;

will create a marginal table of defendant and penalty, and compute all the relevant statistics for this 2x2 table (see below). To get the partial tables of defendant and penalty for each level of victim, and get all the analysis for these 2x2 sub-tables, you can run the following line:

tables victim*defendant*penalty /chisq cmh nocol nopct;

We will discuss the CMH option later. In PROC FREQ, the partial tables will be created given the levels of the first variable you specify when creating a three-way table. You can also run this program in SAS to get the output or look at the death.lst.

Statistical Inference 

Marginal Independence. Let us first look at the marginal table of Defendant’s Race and Death Penalty, while  ignoring the Victim’s race (see below). The point estimate of the odds-ratio is 1.18 and its 95% CI is (0.5902, 2.3634), based on the Case-Control (Odds Ratio) row below. The odds of death penalty are 1.18 times as high for white defendants as they are for black defendants. Recall that a null hypothesis that odds-ratio = 1 means that the variables are independent. Based on these data, we cannot reject the null hypothesis that defendant’s race is independent of the death penalty. Furthermore, we can be 95% confident that the defendant's race and death penalty are independent, since the true odds ratio is between 0.6 and 2.4, and the interval contains 1.00. However, keep in mind that we ignored the Victim's race here. The more precise statement would be to say that Defendant's race and Death penalty appear to be marginally independent. 

SAS Output death.sas


SAS output death.sas

Conditional Independence. Now consider the point estimates of odds ratios when we  control for the Victim’s Race, i.e., conditional odds-ratios (see Sec. 5.2. or parts of the SAS output and try to identify the partial tables and their relevant statistics.) Given the victim is white, the odds of death penalty are 0.69 times as high for white defendants as they are for black, but its 95% CI (0.32, 1.50)) indicates that there is no significant difference; see the row labeled Case-Control (Odds-Ratio). Chi-square statistics test for independence of defendant's race and death penalty for when the victims are white confirm the same finding, e.g., with X2=0.88, df=1, p-value=0.35 we fail to reject the null hypothesis.

condACbyBsas

Given the victim is black, the odds ratio is  0.79 once we adjust the sampling zero by adding 0.5 to each count and then computing it as OR = (0.5 × 97.5)/(9.5 × 6.5) = 0.79. If we consider the confidence intervals for these odds-ratios or for each of these 2 × 2 sub-tables perform the test of independence, the null hypothesis of  independence cannot be rejected. More specifically, we say that Defendant's Race and Death Penalty are conditionally independent given victim's race

However, based on the point estimates, marginal and conditional associations seem to show opposite (effects) directions; e.g., 1.18 vs 0.69 and 0.79, even though the results are not significant. This is an example of the phenomena known as Simpson’s paradox discussed below.

 

R logo For R users should open the death.R file and its corresponding output file death.out.

There are many ways to do this. For example, after we entered the data into an 3-dim array (see death.R), R will display the partial tables by the levels of the last variable in the array (e.g., see deathp in the code or the output).

> deathp <- c(19,132, 11,52,0,9, 6,97)
> deathp
[1]  19 132  11  52   0   9   6  97
> #### we can represent this table also in 3 dimensions
> deathp <- array(deathp, dim=c(2,2,2))
> dimnames(deathp) <- list(DeathPen=c("yes","no"),
+                      Defendant=c("white","black"),
+                      Victim=c("white","black"))
> deathp
, , Victim = white

        Defendant
DeathPen white black
     yes    19    11
     no    132    52

, , Victim = black

        Defendant
DeathPen white black
     yes     0     6
     no      9    97

We can also use ftable() function, e.g., 

ftable(deathp, row.vars=c("Defendant","Victim"),col.vars="DeathPen")

to create flat tables, in this case random variables AB by C, so we are looking at 4 × 2 table representation of the original 2 × 2 × 2 table.


                 DeathPen  yes  no
Defendant Victim                
white     white            19 132
              black             0   9
black     white            11  52
              black             6  97

To create a marginal table, you can use a function margin.table(), e.g., 

margin.table(deathp, c(2,1))

         DeathPen
Defendant yes  no
    white  19 141
    black  17 149

This function creates a marginal table of the second and the first variable from the original array, in this case, Defendant × Death Penalty. For more details run the code and explore more ways to manipulate the data, and post any questions you may have on the discussion board.

Statistical Inference 

Marginal Independence. Let us first look at the marginal table of Defendant’s Race and Death Penalty, while  ignoring the Victim’s race (see below). The point estimate of the odds-ratio is 1.181 and its 95% CI is (0.595, 2.343), using the VCD pagckage. The odds of death penalty are 1.18 times as high for white defendants as they are for black defendants. Recall that a null hypothesis that odds-ratio = 1 means that the variables are independent. Based on these data, we cannot reject the null hypothesis that defendant’s race is independent of the death penalty. Furthermore, we can be 95% confident that the defendant's race and death penalty are independent, since the true odds ratio is between 0.6 and 2.3, and the interval contains 1.00. However, keep in mind that we ignored the Victim's race here. The more precise statement would be to say that Defendant's race and Death penalty appear to be marginally independent. 

> AC<-margin.table(deathp, c(2,1))
> chisq.test(AC)

    Pearson's Chi-squared test with Yates' continuity correction

data:  AC
X-squared = 0.0863, df = 1, p-value = 0.7689
> assocstats(AC)
                     X^2 df P(> X^2)
Likelihood Ratio 0.22145  1  0.63794
Pearson          0.22145  1  0.63794


> oddsratio(AC, log=FALSE)
[1] 1.18106
> exp(confint(oddsratio(AC)))
           lwr      upr
[1,] 0.5953049 2.343172

Conditional Independence. Now consider the point estimates of odds ratios when we  control for the Victim’s Race, i.e., conditional odds-ratios (see Sec. 5.2. or parts of the R output and try to identify the partial tables and their relevant statistics.) Given the victim is white, the odds of death penalty are 0.69 times as high for white defendants as they are for black, but its 95% CI (0.31, 1.50)) indicates that there is no significant difference; see output below and note that there are many different ways of doing this in R -- a few are provided in the death.R code Chi-square statistics test for independence of defendant's race and death penalty for when the victims are white confirm the same finding, e.g., with X2=0.88, df=1, p-value=0.35 we fail to reject the null hypothesis. Notice that if we square the z-value from the summary(lor) below, $(-0.948)^2\approx 0.88$ with two sided p-value $1-pchisq(0.88,1)=0.35$.

> ### via odds-ratios
> oddsratio(deathp, 3, log=FALSE)
    white     black
0.6804408 0.7894737
> ##log odds ratio for a 2x2 table given the levels of the 3rd variable
> lor=oddsratio(deathp,3)
> exp(confint(lor)) ## CI
             lwr      upr
white 0.30704729  1.50791
black 0.04121472 15.12248

 > summary(lor)
      Log Odds Ratio Std. Error z value Pr(>|z|)
white       -0.38501    0.40600 -0.9483   0.1715
black       -0.23639    1.50644 -0.1569   0.4377

Given the victim is black, the odds ratio is  0.79 once we adjust the sampling zero by adding 0.5 to each count and then computing it as OR = (0.5 × 97.5)/(9.5 × 6.5) = 0.79. If we consider the confidence intervals for these odds-ratios or for each of these 2 × 2 sub-tables perform the test of independence, the null hypothesis of  independence cannot be rejected. More specifically, we say that Defendant's Race and Death Penalty are conditionally independent given victim's race

However, based on the point estimates, marginal and conditional associations seem to show opposite (effects) directions; e.g., 1.18 vs 0.69 and 0.79, even though the results are not significant. This is an example of the phenomena known as Simpson’s paradox discussed below.

Simpson’s paradox

Simpson's paradox is the phenomenon that a pair of variables can have marginal association  and partial (conditional) associations in opposite direction. Another way to think about this is that the nature and direction of association changes due to  presence or absence of a third (possibly confounding) variable.

In the simplest example, consider three binary variables, A, B, C. In the marginal table where we are ignoring the presence of C, let

P(B = 1|A = 1) < P(B = 1|A = 2)

In the partial table, after we account for the presence of variable C, let

P(B = 1|A = 1,C = 1) > P(B = 1|A = 2, C = 1) and
P(B = 1|A = 1,C = 2) > P(B = 1|A = 2, C = 2)

In terms of odds ratios, marginal odds θAB < 1, and partial odds θAB(C=1) > 1 and θAB(C=2) > 1.

In the Death Penalty example, we had marginal odds greater than one, and partial odds ratios less than one.

Here is Dr. Morton with a quick video explanation of what this paradox involves.

In addition, for those of  you that would like to delve a little deeper into this, here is a link to "Algebraic geometry of 2 × 2 contingency tables" by Slavkovic and Fienberg.  On page 17 of this document is a diagram of this paradox as well.

These associations can also be captured in terms of models. Next, we explore more on different independence and association concepts that capture relationships between three categorical variables.