Marginal Odds Ratios Section
Marginal Odds Ratios
Marginal odds ratios are odds ratios between two variables in the marginal table and can be used to test for marginal independence between two variables while ignoring the third.
For example, for the \(XY\) margin, where \( \mu_{ij+}\) denotes the expected count of individuals with \(X=i\) and \(Y=j\) in the marginal table obtained by summing over \(Z\), the marginal odds ratio is
\(\theta_{XY}=\dfrac{\mu_{11+}\mu_{22+}}{\mu_{12+}\mu_{21+}}\)
And the estimate of this from the admission data would be
\(\hat{\theta}_{XY}=\dfrac{1198\cdot1278}{1493\cdot557}=1.84\)
Thus, if we aggregate values over all departments, the odds that a male is admitted are an estimated 1.84 times as high as the odds that a female is admitted. And if we were to calculate a suitable confidence interval (e.g., 95%), it would not include 1, indicating that the odds for males is significantly higher.
Conditional Odds Ratios Section
Conditional Odds Ratios
Conditional odds ratios are odds ratios between two variables for fixed levels of the third variable and allow us to test for conditional independence of two variables, given the third.
For example, for the fixed level \(Z=k\), the conditional odds ratio between \(X\) and \(Y\) is
\(\theta_{XY(k)}=\dfrac {\mu_{11k}\mu_{22k}}{\mu_{12k}\mu_{21k}}\)
There are as many such conditional odds ratios as there are levels of the conditional variable, and each can be estimated from the corresponding conditional or partial table between \(X\) and \(Y\), given \(Z=k\). For the first two departments of the admission data, the estimated conditional odds ratios between sex and admission status would be
\(\hat{\theta}_{XY(Z=1)}=\dfrac{512\cdot19}{89\cdot313}=0.35\)
and
\(\hat{\theta}_{XY(Z=2)}=\dfrac{353\cdot8}{17\cdot207}=0.80\)
That is, if we restrict our attention to Department A only, the odds that a male is admitted are an estimated 0.35 times as high as the odds that a female is admitted. Or, equivalently, the odds that a female is admitted are an estimated \(1/0.35=2.86\) times that for males. Likewise, the odds of being admitted are higher for females if we restrict our attention to Department B.
We are calculating the odds ratios for the various partial tables of the larger table and can use them to test the conditional independence of \(X\) and \(Y\), given \(Z\). If \(\theta_{XY(k)} \ne 1\) for at least one level of \(Z\) (at least one \(k\)), it follows that \(X\) and \(Y\) are conditionally associated. We will learn more about this, but for now, let's utilize our knowledge of two-way tables to do some preliminary analysis.
Using what we know about \(2\times2\) tables and tests for association, we can compare the marginal and conditional odds ratios for our example and measure evidence for their significance. What do they tell us about the relationships among these variables?
Let's look at using the SAS program file berkeley.sas (Full output: berkely SAS Output.
/* Analysis of a 3-way table Berkeley Admissions data using PROC FREQ */
options nocenter nodate nonumber linesize=72;
data berkeley;
input D $ S $ A $ count;
cards;
DeptA Male Reject 313
DeptA Male Accept 512
DeptA Female Reject 19
DeptA Female Accept 89
DeptB Male Reject 207
DeptB Male Accept 353
DeptB Female Reject 8
DeptB Female Accept 17
DeptC Male Reject 205
DeptC Male Accept 120
DeptC Female Reject 391
DeptC Female Accept 202
DeptD Male Reject 279
DeptD Male Accept 138
DeptD Female Reject 244
DeptD Female Accept 131
DeptE Male Reject 138
DeptE Male Accept 53
DeptE Female Reject 299
DeptE Female Accept 94
DeptF Male Reject 351
DeptF Male Accept 22
DeptF Female Reject 317
DeptF Female Accept 24
;
/*analysis of the three-way table including CMH test*/
proc freq data=berkeley order=data;
weight count;
tables D*S*A/ cmh chisq relrisk expected nocol norow;
tables S*A/chisq relrisk;
run;
The tables command is where we can specify which variables to tabulate; those that are omitted are summed over (marginalized). For example,
tables S*A/chisq all nocol nopct;
will create a marginal table of sex and admission status, and compute all the relevant statistics for this \(2\times2\) table (see below). To get the partial tables and analyses of sex and admissions status for each department, we can run the following line:
tables D*S*A /chisq cmh nocol nopct;
We will discuss the CMH option later. In PROC FREQ, the partial tables will be created given the levels of the first variable you specify when creating a three-way table. We can see the full output of this program in berkeley SAS Output.
Statistical Inference
Marginal Independence
Let's first look at the marginal table of sex and admission status, while ignoring departments. As we calculated earlier, the point estimate of the odds-ratio is 1.84. That is, the odds of admission for males are an estimated 1.84 times as high as that for females, and it can be shown to be statistically significant based on the 95% confidence interval or a chi-square test of independence. However, keep in mind that we ignored the department information here. A more precise statement would be to say that sex and admission status are marginally associated.
|
|
Statistics for Table of S by A
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 92.6704 | <.0001 |
Likelihood Ratio Chi-Square | 1 | 93.9232 | <.0001 |
Continuity Adj. Chi-Square | 1 | 92.0733 | <.0001 |
Mantel-Haenszel Chi-Square | 1 | 92.6499 | <.0001 |
Phi Coefficient | -0.1431 | ||
Contingency Coefficient | 0.1416 | ||
Cramer's V | -0.1431 |
Odds Ratio and Relative Risks | |||
---|---|---|---|
Statistic | Value | 95% Confidence Limits | |
Odds Ratio | 0.5423 | 0.4785 | 0.6147 |
Relative Risk (Column 1) | 0.7961 | 0.7608 | 0.8330 |
Relative Risk (Column 2) | 1.4679 | 1.3535 | 1.5919 |
Conditional Independence
Now consider the point estimates of odds ratios when we control for the department, which uses conditional odds-ratios (see Sec. 5.2.) Given that the individuals are applying to Department A, the odds of male admission are 0.35 times as high as the odds of female admission, and this is also significant at the 0.05 level (its 95% CI (0.2087, 0.5843) does not include the value 1). Other conditional odds ratios (Department B, C, etc.) are not significant, but it's interesting to note, nevertheless, that these conditional relationships do not have to be in the same direction as the marginal one. When they disagree, we have an example of Simpson's Paradox.
|
|
Statistics for Table 1 of S by A
Controlling for D=DeptA
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 17.2480 | <.0001 |
Likelihood Ratio Chi-Square | 1 | 19.0540 | <.0001 |
Continuity Adj. Chi-Square | 1 | 16.3718 | <.0001 |
Mantel-Haenszel Chi-Square | 1 | 17.2295 | <.0001 |
Phi Coefficient | 0.1360 | ||
Contingency Coefficient | 0.1347 | ||
Cramer's V | 0.1360 |
Odds Ratio and Relative Risks | |||
---|---|---|---|
Statistic | Value | 95% Confidence Limits | |
Odds Ratio | 2.8636 | 1.7112 | 4.7921 |
Relative Risk (Column 1) | 2.1566 | 1.4206 | 3.2737 |
Relative Risk (Column 2) | 0.7531 | 0.6799 | 0.8341 |
R users should open the berkeley.R file and its corresponding output file berkeley.out. R will calculate the partial tables by the levels of the last variable in the array.
#### Berkeley Admissions Example: a 2x2x6 table
#### let X=sex, Y=admission status, Z=department
library(survival)
library(vcd)
#### data available as an array in R
admit <- UCBAdmissions
dimnames(admit) <- list(
Admit=c("Admitted","Rejected"),
Sex=c("Male","Female"),
Dept=c("A","B","C","D","E","F"))
admit
#### create a flat contingency table
ftable(admit, row.vars=c("Dept","Sex"), col.vars="Admit")
### conditional odds ratios
XY.Z <- oddsratio(admit) # log scale
exp(XY.Z$coef)
exp(confint(XY.Z))
plot(XY.Z)
### Cochran-Mantel-Haenszel test of conditional independence
mantelhaen.test(admit)
mantelhaen.test(admit, correct=FALSE)
### marginal tables and odds ratios
XY <- margin.table(admit, c(2,1))
ZY <- margin.table(admit, c(3,1))
XZ <- margin.table(admit, c(2,3))
oddsratio(XY, log=FALSE)
exp(confint(oddsratio(XY)))
chisq.test(XY, correct=FALSE)
#source(file.choose()) # choose breslowday.test_.R
breslowday.test(admit)
We can also use ftable()
function, e.g.,
ftable(admit, row.vars=c("Dept","Sex"), col.vars="Admit")
to create flat tables. In this case, R essentially combines the departments and sexes into 12-row categories, resulting in a \(12\times2\) representation of the original \(2\times2\times6\) table. To create a marginal table, we can use margin.table() to
margin.table(admit, c(2,1))
This function creates a marginal table of the second (sex) and the first (admission status) variables from the original array, which in this case puts the sexes as the rows and admission status groups as the columns.
Statistical Inference
Marginal Independence
Let's first look at the marginal table of sex and admission status, while ignoring departments. As we calculated earlier, the point estimate of the odds-ratio is 1.84. That is, the odds of admission for males are an estimated 1.84 times as high as that for females, and it can be shown to be statistically significant based on the 95% confidence interval or a chi-square test of independence. However, keep in mind that we ignored the department information here. A more precise statement would be to say that sex and admission status are marginally associated.
XY <- margin.table(admit, c(2,1))
chisq.test(XY, correct=FALSE)
Pearson's Chi-squared test
>data: XY
X-squared = 92.205, df = 1, p-value < 2.2e-16
Conditional Independence
Now consider the point estimates of odds ratios when we control for the department, which uses conditional odds-ratios (see Sec. 5.2.) Given that the individuals are applying to Department A, the odds of male admission are 0.35 times as high as the odds of female admission, and this is also significant at the 0.05 level (its 95% CI (0.2087, 0.5843) does not include the value 1). Other conditional odds ratios (Department B, C, etc.) are not significant, but it's interesting to note, nevertheless, that these conditional relationships do not have to be in the same direction as the marginal one. When they disagree, we have an example of Simpson's Paradox.
XY.Z <- oddsratio(admit) # log scale
exp(XY.Z$coef)
A B C D E F 0.3492120 0.8025007 1.1330596 0.9212838 1.2216312 0.8278727 2.5 % 97.5 % A 0.2086756 0.5843954 B 0.3403815 1.8920166 C 0.8545328 1.5023696 D 0.6863345 1.2366620 E 0.8250748 1.8087848 F 0.4552059 1.5056335
Simpson’s paradox Section
Simpson's paradox is the phenomenon that a pair of variables can have marginal association and partial (conditional) associations in opposite direction. Another way to think about this is that the nature and direction of association changes due to the presence or absence of a third (possibly confounding) variable.
In the simplest example, consider three binary variables, \(X\), \(Y\), \(Z\). In the marginal table where we are ignoring the presence of \(Z,\) let
\(P(Y = 1|X = 1) < P(Y = 1|X = 2)\)
In the partial table, after we account for the presence of variable \(Z,\) let
\(P(Y = 1|X = 1,Z = 1) > P(Y = 1|X = 2, Z = 1)\) and
\(P(Y = 1|X = 1,Z = 2) > P(Y = 1|X = 2, Z = 2)\)
In terms of odds ratios, marginal odds \(\theta_{XY} < 1\), and partial odds \(\theta_{XY(Z=1)} > 1\) and \(\theta_{XY(Z=2)} > 1\).
These associations can also be captured in terms of models. Next, we explore more on different independence and association concepts that capture relationships between three categorical variables.
Additional Resources (optional) Section
Here is Dr. Jason Morton with a quick video explanation of what this paradox involves.
In addition, for those of you that would like to delve a little deeper into this, here is a link to "Algebraic geometry of 2 × 2 contingency tables" by Slavkovic and Fienberg. On page 17 of this document is a diagram of this paradox as well.