5.2 - Marginal and Conditional Odds Ratios

Marginal Odds Ratios Section

Marginal Odds Ratios

Marginal odds ratios are odds ratios between two variables in the marginal table and can be used to test for marginal independence between two variables while ignoring the third.

For example, for the \(XY\) margin, where \( \mu_{ij+}\) denotes the expected count of individuals with \(X=i\) and \(Y=j\) in the marginal table obtained by summing over \(Z\), the marginal odds ratio is

\(\theta_{XY}=\dfrac{\mu_{11+}\mu_{22+}}{\mu_{12+}\mu_{21+}}\)

And the estimate of this from the admission data would be

\(\hat{\theta}_{XY}=\dfrac{1198\cdot1278}{1493\cdot557}=1.84\)

Thus, if we aggregate values over all departments, the odds that a male is admitted are an estimated 1.84 times as high as the odds that a female is admitted. And if we were to calculate a suitable confidence interval (e.g., 95%), it would not include 1, indicating that the odds for males is significantly higher.

Conditional Odds Ratios Section

Conditional Odds Ratios

Conditional odds ratios are odds ratios between two variables for fixed levels of the third variable and allow us to test for conditional independence of two variables, given the third.

For example, for the fixed level \(Z=k\), the conditional odds ratio between \(X\) and \(Y\) is

\(\theta_{XY(k)}=\dfrac {\mu_{11k}\mu_{22k}}{\mu_{12k}\mu_{21k}}\)

There are as many such conditional odds ratios as there are levels of the conditional variable, and each can be estimated from the corresponding conditional or partial table between \(X\) and \(Y\), given \(Z=k\). For the first two departments of the admission data, the estimated conditional odds ratios between sex and admission status would be

\(\hat{\theta}_{XY(Z=1)}=\dfrac{512\cdot19}{89\cdot313}=0.35\)

and

\(\hat{\theta}_{XY(Z=2)}=\dfrac{353\cdot8}{17\cdot207}=0.80\)

That is, if we restrict our attention to Department A only, the odds that a male is admitted are an estimated 0.35 times as high as the odds that a female is admitted. Or, equivalently, the odds that a female is admitted are an estimated \(1/0.35=2.86\) times that for males. Likewise, the odds of being admitted are higher for females if we restrict our attention to Department B.

We are calculating the odds ratios for the various partial tables of the larger table and can use them to test the conditional independence of \(X\) and \(Y\), given \(Z\). If \(\theta_{XY(k)} \ne 1\) for at least one level of \(Z\) (at least one \(k\)), it follows that \(X\) and \(Y\) are conditionally associated. We will learn more about this, but for now, let's utilize our knowledge of two-way tables to do some preliminary analysis.

NOTE: Marginal association can be very different from conditional association. That is, marginal and conditional odds ratios do NOT need to be equal. In fact, sometimes they may lead to quite the opposite conclusions!

Using what we know about \(2\times2\) tables and tests for association, we can compare the marginal and conditional odds ratios for our example and measure evidence for their significance. What do they tell us about the relationships among these variables?

Let's look at using the SAS program file berkeley.sas (Full output: berkely SAS Output.


                                        /* Analysis of a 3-way table Berkeley Admissions data using PROC FREQ */
                                        options nocenter nodate nonumber linesize=72;
                                        data berkeley;
                                           input D $ S $ A $ count;
                                           cards;
                                        DeptA  Male    Reject  313
                                        DeptA  Male    Accept  512
                                        DeptA  Female  Reject   19
                                        DeptA  Female  Accept   89
                                        DeptB  Male    Reject  207
                                        DeptB  Male    Accept  353
                                        DeptB  Female  Reject    8
                                        DeptB  Female  Accept   17
                                        DeptC  Male    Reject  205
                                        DeptC  Male    Accept  120
                                        DeptC  Female  Reject  391
                                        DeptC  Female  Accept  202
                                        DeptD  Male    Reject  279
                                        DeptD  Male    Accept  138
                                        DeptD  Female  Reject  244
                                        DeptD  Female  Accept  131
                                        DeptE  Male    Reject  138
                                        DeptE  Male    Accept   53
                                        DeptE  Female  Reject  299
                                        DeptE  Female  Accept   94
                                        DeptF  Male    Reject  351
                                        DeptF  Male    Accept   22
                                        DeptF  Female  Reject  317
                                        DeptF  Female  Accept   24
                                        ;
                                        /*analysis of the three-way table including CMH test*/
                                        proc freq data=berkeley order=data;
                                        weight count;
                                        tables D*S*A/ cmh chisq relrisk expected nocol norow;
                                        tables S*A/chisq relrisk;
                                        run;
                                        

The tables command is where we can specify which variables to tabulate; those that are omitted are summed over (marginalized). For example,

tables S*A/chisq all nocol nopct;

will create a marginal table of sex and admission status, and compute all the relevant statistics for this \(2\times2\) table (see below). To get the partial tables and analyses of sex and admissions status for each department, we can run the following line:

tables D*S*A /chisq cmh nocol nopct;

We will discuss the CMH option later. In PROC FREQ, the partial tables will be created given the levels of the first variable you specify when creating a three-way table. We can see the full output of this program in berkeley SAS Output.

Statistical Inference

Marginal Independence

Let's first look at the marginal table of sex and admission status, while ignoring departments. As we calculated earlier, the point estimate of the odds-ratio is 1.84. That is, the odds of admission for males are an estimated 1.84 times as high as that for females, and it can be shown to be statistically significant based on the 95% confidence interval or a chi-square test of independence. However, keep in mind that we ignored the department information here. A more precise statement would be to say that sex and admission status are marginally associated.

Frequency
Percent
Row Pct
Col Pct
 

Table of S by A

S

A

Reject

Accept

Total

Male

1492
32.97
55.44
53.86
1199
26.49
44.56
68.28
2691
59.46
 
 

Female

1278
28.24
69.65
46.14
557
12.31
30.35
31.72
1835
40.54
 
 

Total

2770
61.20
1756
38.80
4526
100.00

Statistics for Table of S by A

 

Statistic

DF

Value

Prob

Chi-Square

1

92.6704

<.0001

Likelihood Ratio Chi-Square

1

93.9232

<.0001

Continuity Adj. Chi-Square

1

92.0733

<.0001

Mantel-Haenszel Chi-Square

1

92.6499

<.0001

Phi Coefficient

 

-0.1431

 

Contingency Coefficient

 

0.1416

 

Cramer's V

 

-0.1431

 
 

Odds Ratio and Relative Risks

Statistic

Value

95% Confidence Limits

Odds Ratio

0.5423

0.4785

0.6147

Relative Risk (Column 1)

0.7961

0.7608

0.8330

Relative Risk (Column 2)

1.4679

1.3535

1.5919

Conditional Independence

Now consider the point estimates of odds ratios when we control for the department, which uses conditional odds-ratios (see Sec. 5.2.) Given that the individuals are applying to Department A, the odds of male admission are 0.35 times as high as the odds of female admission, and this is also significant at the 0.05 level (its 95% CI (0.2087, 0.5843) does not include the value 1). Other conditional odds ratios (Department B, C, etc.) are not significant, but it's interesting to note, nevertheless, that these conditional relationships do not have to be in the same direction as the marginal one. When they disagree, we have an example of Simpson's Paradox.

Frequency
Expected
Percent
 

Table 1 of S by A

Controlling for D=DeptA

S

A

Reject

Accept

Total

Male

313
293.57
33.55
512
531.43
54.88
825
 
88.42

Female

19
38.431
2.04
89
69.569
9.54
108
 
11.58

Total

332
35.58
601
64.42
933
100.00

Statistics for Table 1 of S by A
Controlling for D=DeptA

 

Statistic

DF

Value

Prob

Chi-Square

1

17.2480

<.0001

Likelihood Ratio Chi-Square

1

19.0540

<.0001

Continuity Adj. Chi-Square

1

16.3718

<.0001

Mantel-Haenszel Chi-Square

1

17.2295

<.0001

Phi Coefficient

 

0.1360

 

Contingency Coefficient

 

0.1347

 

Cramer's V

 

0.1360

 
 

Odds Ratio and Relative Risks

Statistic

Value

95% Confidence Limits

Odds Ratio

2.8636

1.7112

4.7921

Relative Risk (Column 1)

2.1566

1.4206

3.2737

Relative Risk (Column 2)

0.7531

0.6799

0.8341

R users should open the berkeley.R file and its corresponding output file berkeley.out. R will calculate the partial tables by the levels of the last variable in the array.

#### Berkeley Admissions Example: a 2x2x6 table
                #### let X=sex, Y=admission status, Z=department
                library(survival)
                library(vcd)
                #### data available as an array in R
                admit <- UCBAdmissions
                dimnames(admit) <- list(
                Admit=c("Admitted","Rejected"),
                Sex=c("Male","Female"),
                Dept=c("A","B","C","D","E","F"))
                admit
                #### create a flat contingency table
                ftable(admit, row.vars=c("Dept","Sex"), col.vars="Admit")
                ### conditional odds ratios
                XY.Z <- oddsratio(admit) # log scale
                exp(XY.Z$coef)
                exp(confint(XY.Z))
                plot(XY.Z)
                ### Cochran-Mantel-Haenszel test of conditional independence
                mantelhaen.test(admit)
                mantelhaen.test(admit, correct=FALSE)
                ### marginal tables and odds ratios
                XY <- margin.table(admit, c(2,1))
                ZY <- margin.table(admit, c(3,1))
                XZ <- margin.table(admit, c(2,3))
                oddsratio(XY, log=FALSE)
                exp(confint(oddsratio(XY)))
                chisq.test(XY, correct=FALSE)
                #source(file.choose()) # choose breslowday.test_.R
                breslowday.test(admit)
                

We can also use ftable() function, e.g.,

ftable(admit, row.vars=c("Dept","Sex"), col.vars="Admit")

to create flat tables. In this case, R essentially combines the departments and sexes into 12-row categories, resulting in a \(12\times2\) representation of the original \(2\times2\times6\) table. To create a marginal table, we can use margin.table() to 

margin.table(admit, c(2,1))

This function creates a marginal table of the second (sex) and the first (admission status) variables from the original array, which in this case puts the sexes as the rows and admission status groups as the columns. 

Statistical Inference

Marginal Independence

Let's first look at the marginal table of sex and admission status, while ignoring departments. As we calculated earlier, the point estimate of the odds-ratio is 1.84. That is, the odds of admission for males are an estimated 1.84 times as high as that for females, and it can be shown to be statistically significant based on the 95% confidence interval or a chi-square test of independence. However, keep in mind that we ignored the department information here. A more precise statement would be to say that sex and admission status are marginally associated.

XY <- margin.table(admit, c(2,1))
                                                                                chisq.test(XY, correct=FALSE)

Pearson's Chi-squared test

>data:  XY
                                                                                X-squared = 92.205, df = 1, p-value < 2.2e-16

Conditional Independence

Now consider the point estimates of odds ratios when we control for the department, which uses conditional odds-ratios (see Sec. 5.2.) Given that the individuals are applying to Department A, the odds of male admission are 0.35 times as high as the odds of female admission, and this is also significant at the 0.05 level (its 95% CI (0.2087, 0.5843) does not include the value 1). Other conditional odds ratios (Department B, C, etc.) are not significant, but it's interesting to note, nevertheless, that these conditional relationships do not have to be in the same direction as the marginal one. When they disagree, we have an example of Simpson's Paradox.

XY.Z <- oddsratio(admit) # log scale
                                                                                exp(XY.Z$coef)
       A         B         C         D         E         F 
                  0.3492120 0.8025007 1.1330596 0.9212838 1.2216312 0.8278727 
                       2.5 %    97.5 %
                   A 0.2086756 0.5843954
                   B 0.3403815 1.8920166
                   C 0.8545328 1.5023696
                   D 0.6863345 1.2366620
                   E 0.8250748 1.8087848
                   F 0.4552059 1.5056335

Simpson’s paradox Section

Simpson's paradox is the phenomenon that a pair of variables can have marginal association and partial (conditional) associations in opposite direction. Another way to think about this is that the nature and direction of association changes due to the presence or absence of a third (possibly confounding) variable.

In the simplest example, consider three binary variables, \(X\), \(Y\), \(Z\). In the marginal table where we are ignoring the presence of \(Z,\) let

\(P(Y = 1|X = 1) < P(Y = 1|X = 2)\)

In the partial table, after we account for the presence of variable \(Z,\) let

\(P(Y = 1|X = 1,Z = 1) > P(Y = 1|X = 2, Z = 1)\) and
\(P(Y = 1|X = 1,Z = 2) > P(Y = 1|X = 2, Z = 2)\)

In terms of odds ratios, marginal odds \(\theta_{XY} < 1\), and partial odds \(\theta_{XY(Z=1)} > 1\) and \(\theta_{XY(Z=2)} > 1\).

These associations can also be captured in terms of models. Next, we explore more on different independence and association concepts that capture relationships between three categorical variables.

Additional Resources (optional) Section

Here is Dr. Jason Morton with a quick video explanation of what this paradox involves.

In addition, for those of you that would like to delve a little deeper into this, here is a link to "Algebraic geometry of 2 × 2 contingency tables" by Slavkovic and Fienberg. On page 17 of this document is a diagram of this paradox as well.