# 5.4.4 - Conditional Independence

Printer-friendly version

The concept of conditional independence is very important and it is the basis for many statistical models (e.g., latent class models, factor analysis, item response models, graphical models, etc.).

There are three possible conditional independence models with three random variables: (AB, AC), (AB, BC), and (AC, BC).  Consider the model (AB, AC),

which means that B and C are conditionally independent given A. In mathematical terms, the model (AB, AC) means that the conditional probability of B and C given A equals the product of conditional probabilities of B given A and C given A:

$P(B=j,C=k|A=i)=P(B=j|A=i) \times P(C=k|A=i)$

In terms of odds-ratios, this model implies that if we look at the partial tables, that is B × C tables at each level of A = 1, . . . , I , that the odds-ratios in these tables should not significantly different from 1. Tying this back to 2-way tables, we can test in each of the partial B × C tables at each level of A to see if independence holds.

H0: θBC(A=i) = 1 for all i
vs.
H0: at least one θBC(A=i) ≠ 1

It is straightforward to show that the model (AB, C), (AC, B) and (A, B, C) are special cases of this model.  Therefore, if any of these simpler models fit, then (AB, AC) will also fit. Can you see this?

Think about the following question, then click on the icon to the left display an answer.

#### What is a solution for when (A, B, C) holds?

Intuitively, (AB, AC) means that any relationship that may exist between B and C can be explained by A . In other words, B and C may appear to be related if A is not considered (e.g. only look at the marginal table B×C), but if one could control for A by holding it constant (i.e. by looking at subsets of the data having identical values of A, that is looking at partial tables B×C for each level of A), then any apparent relationship between B and C would disappear. Remember the Simpson's paradox?! Marginal and conditional associations can be different!

Under the conditional independence model, the cell probabilities can be written as

\begin{align}
\pi_{ijk} &= P(A=i) P(B=j,C=k|A=i)\\
&= P(A=i)P(B=j|A=i)P(C=k|A=i)\\
&= \pi_{i++}\pi_{j|i}\pi_{k|i}\\
\end{align}

where Σi πi++ = 1, Σj πj | i = 1 for each i, and Σk πk | i = 1 for each i. The number of free parameters is (I − 1) + I (J − 1) + I (K − 1).

The ML estimates of these parameters are

$\hat{\pi}_{i++}=n_{i++}/n$
$\hat{\pi}_{j|i}=n_{ij+}/n_{i++}$
$\hat{\pi}_{k|i}=n_{i+k}/n_{i++}$

and the estimated expected frequencies are

$\hat{E}_{ijk}=\dfrac{n_{ij+}n_{i+k}}{n_{i++}}.$

Notice again the similarity to the formula for independence in a two-way table.

The test for conditional independence of B and C given A is equivalent to separating the table by levels of A = 1, . . . , I , and testing for independence within each level.

There are two ways we can test for conditional independence:

1. The overall X2or G2statistics can be found by summing the individual test statistics for BC independence across the levels of A. The total degrees of freedom for this test must be I (J − 1)(K − 1). See example below, and we’ll see more on this again when we do log-linear models. Note, if we can reject independence in one of the partial tables, then we can reject the conditional independence and don't need to run the full analysis.
2. Cochran-Mantel-Haenszel Test (using option CMH in PROC FREQ/ TABLES/ in SAS and mantelhaen.test in R). This test produces Mantel-Haenszel statistic also known as "average partial association" statistic.

### Example - Boy Scouts and Juvenile Delinquency

Let us return to the table that classifies n = 800 boys by boy scout status B, juvenile delinquency D, and socioeconomic status S. We already found that the models of mutual independence (D, B, S) and joint independence (D, BS) did not fit. Thus we know that either B or S (or both) are related to D. Let us temporarily ignore S and see whether B and D are related (marginal independence). Ignoring S means that we classify individuals only by the variables B and D; in other words, we form a two way table for B × D, the same table that we would get by collapsing (i.e. adding) over the levels of S.

 Boy scout Delinquent Yes No Yes 33 343 No 64 360

The X2 test for this marginal independence demonstrates that a relationship between B and D does exist. Expected counts are printed below the observed counts:

 Delinquent=Yes Delinquent=No Total Boy Scout=Yes 33 45.59 343 330.41 376 Boy Scout=No 64 51.41 360 372.59 424 Total 97 703 800

X2 = 3.477 + 0.480 + 3.083 + 0.425 = 7.465, where each value in the sum is a contribution (squared Pearson residual) of each cell to the overall Pearson X2 statistic. With df = 1, the p-value=1- PROBCHI(7.465,1)=0.006 in SAS or in R p-value=1-pchisq(7.465,1)=0.006, rejecting the marginal independence of B and D.  Or, simply do the Chi-squared test of independence in this 2 × 2 table!

The odds ratio of (33 · 360)/(64 · 343) = 0.54 indicates a strong negative relationship between boy scout status and delinquency; it appears that boy scouts are 46% less likely (on the odds scale) to be delinquent than non-boy scouts.

To a proponent of scouting, this result might suggest that being a boy scout has substantial benefits in reducing the rates of juvenile delinquency. But boy scouts tend to differ from non-scouts on a wide variety of characteristics. Could one of these characteristics—say, socioeconomic status—explain the apparent relationship between B and D?

Let’s now test the hypothesis that B and D are conditionally independent given S. To do this, we enter the data for each 2 × 2 table of B × D corresponding to different levels of, S = 1, S = 2, and S = 3, respectively, then perform independence tests on these tables, and add up the X2statistics (or run the CMH test -- as in the next section).

To do this in SAS you can run the following command in boys.sas:

tables SES*scouts*delinquent / chisq;

Notice that the order is important; SAS will create partial tables for each level of the first variable; see boys.lst

The individual chi-square statistics from the output after each partial table are given below. To test the conditional independence of (BS, DS) we can add these up to get the overall chi-squared statistic:

0.053+0.006 + 0.101 = 0.160.

Each of the individual tests has 1 degree of freedom, so the total number of degrees of freedom is 3. The p-value is $P(\chi^2_3 \geq 0.1600)=0.984$, indicating that the conditional independence model fits extremely well. As a result, we will not reject this model here. However, the p-value is so high - doesn't it make you wonder what is going on here?

The apparent relationship between B and D can be explained by S; after the systematic differences in social class among scouts and non-scouts are accounted for, there is no additional evidence that scout membership has any effect on delinquency. The fact that the p-value is so close to 1 suggests that the model fit is too good to be true; it suggests that the data may have been fabricated. (It’s true; Dr. Schafer made some of the data in order to illustrate this point!)

[Note: In the next section we will see how to use the CMH option in SAS - see boys.sas]

In R, in boys.R for example

temp[,,1]

will give us the B×D partial table for the first level of S, and similarly for the levels 2 and 3, where temp was the name of our 3-way table this code; see boys.out.

The individual chi-square statistics from the output after each partial table are given below.

> chisq.test(temp[,,1], correct=FALSE)

Pearson's Chi-squared test

data:  temp[, , 1]
X-squared = 0.0058, df = 1, p-value = 0.9392

> temp[,,2]
scout
deliquent  no yes
no  132 104
yes  20  14
> chisq.test(temp[,,2], correct=FALSE)

Pearson's Chi-squared test

data:  temp[, , 2]
X-squared = 0.101, df = 1, p-value = 0.7507

> temp[,,3]
scout
deliquent no yes
no  59 196
yes  2   8
> chisq.test(temp[,,3], correct=FALSE)

Pearson's Chi-squared test

data:  temp[, , 3]
X-squared = 0.0534, df = 1, p-value = 0.8172

To test the conditional independence of (BS, DS) we can add these up to get the overall chi-squared statistic:

0.006 + 0.101 + 0.053 = 0.160.

Each of the individual tests has 1 degree of freedom, so the total number of degrees of freedom is 3. The p-value is $P(\chi^2_3 \geq 0.1600)=0.984$, indicating that the conditional independence model fits extremely well. As a result, we will not reject this model here. However, the p-value is so high - doesn't it make you wonder what is going on here?

The apparent relationship between B and D can be explained by S; after the systematic differences in social class among scouts and non-scouts are accounted for, there is no additional evidence that scout membership has any effect on delinquency. The fact that the p-value is so close to 1 suggests that the model fit is too good to be true; it suggests that the data may have been fabricated. (It’s true; Dr. Schafer made some of the data in order to illustrate this point!)

[Note: In the next section we will see how to use the mantelhean.test in R, boys.R]

Spurious Relationship

To see how the spurious relationship between B and D could have been induced, it is worthwhile to examine the B × S and D × S marginal tables.

The B × S marginal table is shown below.

 Socioeconomic status Boy scout Yes No Low 54 211 Medium 118 152 High 204 61

The test of independence for this table yields X2 = 172.2 with 2 degrees of freedom, which gives a p-value of essentially zero. There is a highly significant relationship between B and S. To see what the relationship is, we can estimate the conditional probabilities of B = 1 for S = 1, S = 2, and S = 3:

P(B=1|S=1)=54/(54 + 211) = .204
P(B=1|S=2)=118/(118 + 152) = .437
P(B=1|S=3)=204/(204 + 61) = .769

The probability of being a boy scout rises dramatically as socioeconomic status goes up.

Now let’s examine the D × S marginal table.

 Socioeconomic status Delinquent Yes No Low 53 212 Medium 34 236 High 10 255

The test for independence here yields X2 = 32.8 with 2 degrees of freedom, p-value ≈ 0. The estimated conditional probabilities of D = 1 for S = 1, S = 2, and S = 3 are shown below.

P(D=1|S=1)=53/(53 + 212) = .200
P(D=1|S=2)=34/(34 + 236) = .126
P(D=1|S=3=10/(10 + 255) = .038

The rate of delinquency drops as socioeconomic status goes up. Now we see how S induces a spurious relationship between B and D. Boy scouts tend to be of higher social class than non-scouts, and boys in higher social class have a smaller chance of being delinquent. The apparent effect of scouting is really an effect of social class.

In the next section, we study how to test for conditional independence via the CMH statistic.

 EXERCISE Recall the results from death.sas (output: death.lst), and death.R (output: death.out) earlier in the lesson, and testing for independence via odds-ratios for example within partial tables of Defendant's race vs. Death penalty, A × C, for each level of Victim's race, B; see the Notation section of this lesson if you don't recall marginal and partial tables. The question was, given the Victim's status, are the Defendant's race and Death penalty independent? In this case, the null hypothesis is that the conditional independence models fits, i.e., (AB, BC). What is the graphical representation here? This can be stated in terms of the partial odds-ratios: H0: θAC(B=white) = θAC(B=black)= 1 vs. H0: at least one θAC(B=j) ≠ 1 Based on the partial odds-ratios estimates, their confidence intervals, the chi-squared and deviance statistics for the test of independence in each of these partial tables, at the alpha level of 0.05, we do not have sufficient evidence to reject the null hypothesis, and thus the model of conditional independence describes the data well.