5.2 - Marginal and Conditional Odds Ratios

5.2 - Marginal and Conditional Odds Ratios

Marginal Odds Ratios

Marginal Odds Ratios
Marginal odds ratios are odds ratios between two variables in the marginal table and can be used to test for marginal independence between two variables while ignoring the third.

For example, for the $$XY$$ margin, where $$\mu_{ij+}$$ denotes the expected count of individuals with $$X=i$$ and $$Y=j$$ in the marginal table obtained by summing over $$Z$$, the marginal odds ratio is

$$\theta_{XY}=\dfrac{\mu_{11+}\mu_{22+}}{\mu_{12+}\mu_{21+}}$$

And the estimate of this from the admission data would be

$$\hat{\theta}_{XY}=\dfrac{1198\cdot1278}{1493\cdot557}=1.84$$

Thus, if we aggregate values over all departments, the odds that a male is admitted are an estimated 1.84 times as high as the odds that a female is admitted. And if we were to calculate a suitable confidence interval (e.g., 95%), it would not include 1, indicating that the odds for males is significantly higher.

Conditional Odds Ratios

Conditional Odds Ratios
Conditional odds ratios are odds ratios between two variables for fixed levels of the third variable and allow us to test for conditional independence of two variables, given the third.

For example, for the fixed level $$Z=k$$, the conditional odds ratio between $$X$$ and $$Y$$ is

$$\theta_{XY(k)}=\dfrac {\mu_{11k}\mu_{22k}}{\mu_{12k}\mu_{21k}}$$

There are as many such conditional odds ratios as there are levels of the conditional variable, and each can be estimated from the corresponding conditional or partial table between $$X$$ and $$Y$$, given $$Z=k$$. For the first two departments of the admission data, the estimated conditional odds ratios between sex and admission status would be

$$\hat{\theta}_{XY(Z=1)}=\dfrac{512\cdot19}{89\cdot313}=0.35$$

and

$$\hat{\theta}_{XY(Z=2)}=\dfrac{353\cdot8}{17\cdot207}=0.80$$

That is, if we restrict our attention to Department A only, the odds that a male is admitted are an estimated 0.35 times as high as the odds that a female is admitted. Or, equivalently, the odds that a female is admitted are an estimated $$1/0.35=2.86$$ times that for males. Likewise, the odds of being admitted are higher for females if we restrict our attention to Department B.

We are calculating the odds ratios for the various partial tables of the larger table and can use them to test the conditional independence of $$X$$ and $$Y$$, given $$Z$$. If $$\theta_{XY(k)} \ne 1$$ for at least one level of $$Z$$ (at least one $$k$$), it follows that $$X$$ and $$Y$$ are conditionally associated. We will learn more about this, but for now, let's utilize our knowledge of two-way tables to do some preliminary analysis.

NOTE: Marginal association can be very different from conditional association. That is, marginal and conditional odds ratios do NOT need to be equal. In fact, sometimes they may lead to quite the opposite conclusions!

Using what we know about $$2\times2$$ tables and tests for association, we can compare the marginal and conditional odds ratios for our example and measure evidence for their significance. What do they tell us about the relationships among these variables?

Let's look at using the SAS program file berkeley.sas (Full output: berkely SAS Output.


/* Analysis of a 3-way table Berkeley Admissions data using PROC FREQ */

options nocenter nodate nonumber linesize=72;
data berkeley;
input D $S$ A $count; cards; DeptA Male Reject 313 DeptA Male Accept 512 DeptA Female Reject 19 DeptA Female Accept 89 DeptB Male Reject 207 DeptB Male Accept 353 DeptB Female Reject 8 DeptB Female Accept 17 DeptC Male Reject 205 DeptC Male Accept 120 DeptC Female Reject 391 DeptC Female Accept 202 DeptD Male Reject 279 DeptD Male Accept 138 DeptD Female Reject 244 DeptD Female Accept 131 DeptE Male Reject 138 DeptE Male Accept 53 DeptE Female Reject 299 DeptE Female Accept 94 DeptF Male Reject 351 DeptF Male Accept 22 DeptF Female Reject 317 DeptF Female Accept 24 ; /*analysis of the three-way table including CMH test*/ proc freq data=berkeley order=data; weight count; tables D*S*A/ cmh chisq relrisk expected nocol norow; tables S*A/chisq relrisk; run;  The tables command is where we can specify which variables to tabulate; those that are omitted are summed over (marginalized). For example, tables S*A/chisq all nocol nopct; will create a marginal table of sex and admission status, and compute all the relevant statistics for this $$2\times2$$ table (see below). To get the partial tables and analyses of sex and admissions status for each department, we can run the following line: tables D*S*A /chisq cmh nocol nopct; We will discuss the CMH option later. In PROC FREQ, the partial tables will be created given the levels of the first variable you specify when creating a three-way table. We can see the full output of this program in berkeley SAS Output. Statistical Inference Marginal Independence Let's first look at the marginal table of sex and admission status, while ignoring departments. As we calculated earlier, the point estimate of the odds-ratio is 1.84. That is, the odds of admission for males are an estimated 1.84 times as high as that for females, and it can be shown to be statistically significant based on the 95% confidence interval or a chi-square test of independence. However, keep in mind that we ignored the department information here. A more precise statement would be to say that sex and admission status are marginally associated. Frequency Percent Row Pct Col Pct Table of S by A S A Reject Accept Total Male 1492 32.97 55.44 53.86 1199 26.49 44.56 68.28 2691 59.46 Female 1278 28.24 69.65 46.14 557 12.31 30.35 31.72 1835 40.54 Total 2770 61.20 1756 38.80 4526 100.00 Statistics for Table of S by A Statistic DF Value Prob Chi-Square 1 92.6704 <.0001 Likelihood Ratio Chi-Square 1 93.9232 <.0001 Continuity Adj. Chi-Square 1 92.0733 <.0001 Mantel-Haenszel Chi-Square 1 92.6499 <.0001 Phi Coefficient -0.1431 Contingency Coefficient 0.1416 Cramer's V -0.1431 Odds Ratio and Relative Risks Statistic Value 95% Confidence Limits Odds Ratio 0.5423 0.4785 0.6147 Relative Risk (Column 1) 0.7961 0.7608 0.8330 Relative Risk (Column 2) 1.4679 1.3535 1.5919 Conditional Independence Now consider the point estimates of odds ratios when we control for the department, which uses conditional odds-ratios (see Sec. 5.2.) Given that the individuals are applying to Department A, the odds of male admission are 0.35 times as high as the odds of female admission, and this is also significant at the 0.05 level (its 95% CI (0.2087, 0.5843) does not include the value 1). Other conditional odds ratios (Department B, C, etc.) are not significant, but it's interesting to note, nevertheless, that these conditional relationships do not have to be in the same direction as the marginal one. When they disagree, we have an example of Simpson's Paradox. Frequency Expected Percent Table 1 of S by A Controlling for D=DeptA S A Reject Accept Total Male 313 293.57 33.55 512 531.43 54.88 825 88.42 Female 19 38.431 2.04 89 69.569 9.54 108 11.58 Total 332 35.58 601 64.42 933 100.00 Statistics for Table 1 of S by A Controlling for D=DeptA Statistic DF Value Prob Chi-Square 1 17.2480 <.0001 Likelihood Ratio Chi-Square 1 19.0540 <.0001 Continuity Adj. Chi-Square 1 16.3718 <.0001 Mantel-Haenszel Chi-Square 1 17.2295 <.0001 Phi Coefficient 0.1360 Contingency Coefficient 0.1347 Cramer's V 0.1360 Odds Ratio and Relative Risks Statistic Value 95% Confidence Limits Odds Ratio 2.8636 1.7112 4.7921 Relative Risk (Column 1) 2.1566 1.4206 3.2737 Relative Risk (Column 2) 0.7531 0.6799 0.8341 R users should open the berkeley.R file and its corresponding output file berkeley.out. R will calculate the partial tables by the levels of the last variable in the array. ############################# ### Berkeley admissions data ### Lessons 4 & 5 ### Uses dataset already in R ### See also berkeley1.R in Lesson 4 for a different code ### See also related berkeleyLoglin.R in Lesson 5 ############################# ### Dataset already exist in R library UCBAdmissions ### To test the odds-ratios in the marginal table and each of the subtables library(vcd) ##marginal table Admit x Gender admit.gender=margin.table(UCBAdmissions, c(1,2)) admit.gender admit.gender/4526 exp(oddsratio(admit.gender)) chisq.test(admit.gender) ##Tests for partial tables AdmitxGender for each level of Dept. chisq.test(UCBAdmissions[,,1]) exp(oddsratio(UCBAdmissions[,,1])) chisq.test(UCBAdmissions[,,2]) exp(oddsratio(UCBAdmissions[,,2])) chisq.test(UCBAdmissions[,,3]) exp(oddsratio(UCBAdmissions[,,3])) chisq.test(UCBAdmissions[,,4]) exp(oddsratio(UCBAdmissions[,,4])) chisq.test(UCBAdmissions[,,5]) exp(oddsratio(UCBAdmissions[,,5]) chisq.test(UCBAdmissions[,,6]) exp(oddsratio(UCBAdmissions[,,6])) ### To visualize graphically these association explore fourfold() function in the vcd() package! ### CMH test mantelhaen.test(UCBAdmissions)  We can also use ftable() function, e.g., ftable(admit, row.vars=c("Dept","Sex"), col.vars="Admit") to create flat tables. In this case, R essentially combines the departments and sexes into 12-row categories, resulting in a $$12\times2$$ representation of the original $$2\times2\times6$$ table. To create a marginal table, we can use margin.table() to margin.table(admit, c(2,1)) This function creates a marginal table of the second (sex) and the first (admission status) variables from the original array, which in this case puts the sexes as the rows and admission status groups as the columns. Statistical Inference Marginal Independence Let's first look at the marginal table of sex and admission status, while ignoring departments. As we calculated earlier, the point estimate of the odds-ratio is 1.84. That is, the odds of admission for males are an estimated 1.84 times as high as that for females, and it can be shown to be statistically significant based on the 95% confidence interval or a chi-square test of independence. However, keep in mind that we ignored the department information here. A more precise statement would be to say that sex and admission status are marginally associated. XY <- margin.table(admit, c(2,1)) chisq.test(XY, correct=FALSE) Pearson's Chi-squared test >data: XY X-squared = 92.205, df = 1, p-value < 2.2e-16 Conditional Independence Now consider the point estimates of odds ratios when we control for the department, which uses conditional odds-ratios (see Sec. 5.2.) Given that the individuals are applying to Department A, the odds of male admission are 0.35 times as high as the odds of female admission, and this is also significant at the 0.05 level (its 95% CI (0.2087, 0.5843) does not include the value 1). Other conditional odds ratios (Department B, C, etc.) are not significant, but it's interesting to note, nevertheless, that these conditional relationships do not have to be in the same direction as the marginal one. When they disagree, we have an example of Simpson's Paradox. XY.Z &lt;- oddsratio(admit) # log scale exp(XY.Z$coef)
       A         B         C         D         E         F
0.3492120 0.8025007 1.1330596 0.9212838 1.2216312 0.8278727
2.5 %    97.5 %
A 0.2086756 0.5843954
B 0.3403815 1.8920166
C 0.8545328 1.5023696
D 0.6863345 1.2366620
E 0.8250748 1.8087848
F 0.4552059 1.5056335

Simpson's paradox is the phenomenon that a pair of variables can have marginal association and partial (conditional) associations in opposite direction. Another way to think about this is that the nature and direction of association changes due to the presence or absence of a third (possibly confounding) variable.

In the simplest example, consider three binary variables, $$X$$, $$Y$$, $$Z$$. In the marginal table where we are ignoring the presence of $$Z,$$ let

$$P(Y = 1|X = 1) < P(Y = 1|X = 2)$$

In the partial table, after we account for the presence of variable $$Z,$$ let

$$P(Y = 1|X = 1,Z = 1) > P(Y = 1|X = 2, Z = 1)$$ and
$$P(Y = 1|X = 1,Z = 2) > P(Y = 1|X = 2, Z = 2)$$

In terms of odds ratios, marginal odds $$\theta_{XY} < 1$$, and partial odds $$\theta_{XY(Z=1)} > 1$$ and $$\theta_{XY(Z=2)} > 1$$.

These associations can also be captured in terms of models. Next, we explore more on different independence and association concepts that capture relationships between three categorical variables.