5.3.4 - Conditional Independence

The concept of conditional independence is very important and it is the basis for many statistical models (e.g., latent class models, factor analysis, item response models, graphical models, etc.).

There are three possible conditional independence models with three random variables: (\(XY\), \(XZ\)), (\(XY\), \(YZ\)), and (\(XZ\), \(YZ\)). Consider the model (\(XY\), \(XZ\)),

XYZ

which means that \(Y\) and \(Z\) are conditionally independent given \(X\). In mathematical terms, the model (\(XY\), \(XZ\)) means that the conditional probability of \(Y\) and \(Z\), given \(X\) equals the product of conditional probabilities of \(Y\) given \(X\) and \(Z\), given \(X\):

\(P(Y=j,Z=k|X=i)=P(Y=j|X=i) \times P(Z=k|X=i)\)

In terms of odds-ratios, this model implies that if we look at the partial tables, that is \(Y\times Z\) tables at each level of \(X = 1, \ldots , I\), that the odds-ratios in these tables should not significantly differ from 1. Tying this back to 2-way tables, we can test in each of the partial \(Y\times Z\) tables at each level of \(X\) to see if independence holds.

\(H_0\colon \theta_{YZ\left(X=i\right)} = 1\) for all i

vs.

\(H_0\colon\) at least one \(\theta_{YZ\left(X=i\right)} \ne 1\)

 Stop and Think!
It is straightforward to show that the model \((XY, Z)\), \((XZ, Y)\) and \((X,Y,Z)\) are special cases of this model. Therefore, if any of these simpler models fit, then \((XY, XZ)\) will also fit. Can you see this? What is a solution for when (X,Y,Z) holds?
  1. Recall, \((XY, XZ)\) implies that \(P(Y,Z| X)=P(Y|X)P(Z|X)\).
  2. If \((X,Y,Z)\) holds then we know that \(P(X, Y, Z)= P(X) P(Y) P(Z)\), and we also know that the mutual independence implies that \(P(Y|X)=P(Y)\) and \(P(Z|X)=P(Z)\).
  3. Thus from the probability properties, \(P(Y,Z|X)=P(X,Y,Z)/P(X)\) and from (2) above, \(P(Y,Z|X)=P(X,Y,Z)/P(X) = P(X)P(Y)P(Z)/P(X)= P(Y)P(Z)=P(Y|X)P(Z|X)\), which equals the expression in (1)

Intuitively, \((XY, XZ)\) means that any relationship that may exist between \(Y\) and \(Z\) can be explained by \(X\) . In other words, \(Y\) and \(Z\) may appear to be related if \(X\) is not considered (e.g. only look at the marginal table \(Y\times Z\), but if one could control for \(X\) by holding it constant (i.e. by looking at subsets of the data having identical values of \(X\), that is looking at partial tables \(Y\times Z\) for each level of \(X\)), then any apparent relationship between \(Y\) and \(Z\) would disappear (remember Simpson's paradox?). Marginal and conditional associations can be different!

Under the conditional independence model, the cell probabilities can be written as

\begin{align} \pi_{ijk} &= P(X=i) P(Y=j,Z=k|X=i)\\ &= P(X=i)P(Y=j|X=i)P(Z=k|X=i)\\ &= \pi_{i++}\pi_{j|i}\pi_{k|i}\\ \end{align}

where \(\sum_i \pi_{i++} = 1, \sum_j \pi_{j | i} = 1\) for each i, and \(\sum_k \pi_{k | i} = 1\) for each \(i\). The number of free parameters is \((I − 1) + I (J − 1) + I (K − 1)\).

The ML estimates of these parameters are

\(\hat{\pi}_{i++}=n_{i++}/n\)

\(\hat{\pi}_{j|i}=n_{ij+}/n_{i++}\)

\(\hat{\pi}_{k|i}=n_{i+k}/n_{i++}\)

and the estimated expected frequencies are

\(\hat{E}_{ijk}=\dfrac{n_{ij+}n_{i+k}}{n_{i++}}.\)

Notice again the similarity to the formula for independence in a two-way table.

The test for conditional independence of \(Y\) and \(Z\) given \(X\) is equivalent to separating the table by levels of \(X = 1, \ldots , I\) and testing for independence within each level.

There are two ways we can test for conditional independence:

  1. The overall \(X^2\) or \(G^2 \)statistics can be found by summing the individual test statistics for \(YZ\) independence across the levels of \(X\). The total degrees of freedom for this test must be \(I (J − 1)(K − 1)\). See the example below, and we’ll see more on this again when we look at log-linear models. Note, if we can reject independence in one of the partial tables, then we can reject the conditional independence and don't need to run the full analysis.
  2. Cochran-Mantel-Haenszel Test (using option CMH in PROC FREQ/ TABLES/ in SAS and mantelhaen.test in R). This test produces the Mantel-Haenszel statistic also known as the "average partial association" statistic.

Example: Boy Scouts and Juvenile Delinquency Section

Let us return to the table that classifies \(n = 800\) boys by scout status B, juvenile delinquency D, and socioeconomic status S. We already found that the models of mutual independence (D, B, S) and joint independence (D, BS) did not fit. Thus we know that either B or S (or both) are related to D. Let us temporarily ignore S and see whether B and D are related (marginal independence). Ignoring S means that we classify individuals only by the variables B and D; in other words, we form a two-way table for B \(\times\) D, the same table that we would get by collapsing (i.e. adding) over the levels of S.

Boy scout Delinquent
Yes No
Yes 33 343
No 64 360

The \(X^2\) test for this marginal independence demonstrates that a relationship between B and D does exist. Expected counts are printed below the observed counts:

  Delinquent=Yes Delinquent=No Total
Boy Scout=Yes 33
45.59
343
330.41
376
Boy Scout=No 64
51.41
360
372.59
424
Total 97 703 800

\(X^2 = 3.477 + 0.480 + 3.083 + 0.425 = 7.465\), where each value in the sum is a contribution (squared Pearson residual) of each cell to the overall Pearson \(X^2\) statistic. With df = 1, the p-value=1- PROBCHI(7.465,1)=0.006 in SAS or in R p-value=1-pchisq(7.465,1)=0.006, rejecting the marginal independence of B and D. This would also be consistent with the chi-square test of independence in the \(2\times2\) table.

The odds ratio of \((33 \cdot360)/(64 \cdot 343) = 0.54\) indicates a strong negative relationship between boy scout status and delinquency; it appears that boy scouts are 46% less likely (on the odds scale) to be delinquent than non-boy scouts.

To a proponent of scouting, this result might suggest that being a boy scout has substantial benefits in reducing the rates of juvenile delinquency. But boy scouts tend to differ from non-scouts on a wide variety of characteristics. Could one of these characteristics—say, socioeconomic status—explain the apparent relationship between B and D?

Let’s now test the hypothesis that B and D are conditionally independent, given S. To do this, we enter the data for each \(2 \times 2\) table of B \(\times\) D corresponding to different levels of, S = 1, S = 2, and S = 3, respectively, then perform independence tests on these tables, and add up the \(X^2\) statistics (or run the CMH test -- as in the next section).

To do this in SAS you can run the following command in boys.sas:

tables SES*scouts*delinquent / chisq;

Notice that the order is important; SAS will create partial tables for each level of the first variable; see boys.lst

The individual chi-square statistics from the output after each partial table are given below. To test the conditional independence of (BS, DS) we can add these up to get the overall chi-squared statistic:

0.053+0.006 + 0.101 = 0.160.

Each of the individual tests has 1 degree of freedom, so the total number of degrees of freedom is 3. The p-value is \(P(\chi^2_3 \geq 0.1600)=0.984\), indicating that the conditional independence model fits extremely well. As a result, we will not reject this model here. However, the p-value is so high - doesn't it make you wonder what is going on here?

The apparent relationship between B and D can be explained by S; after the systematic differences in social class among scouts and non-scouts are accounted for, there is no additional evidence that scout membership has any effect on delinquency. The fact that the p-value is so close to 1 suggests that the model fit is too good to be true; it suggests that the data may have been fabricated. (It’s true; some of this data was created in order to illustrate this point!)

In the next section, we will see how to use the CMH option in SAS.

In R, in boys.R for example

temp[,,1]

will give us the B \(\times\) D partial table for the first level of S, and similarly for the levels 2 and 3, where temp was the name of our 3-way table this code; see boys.out.

The individual chi-square statistics from the output after each partial table are given below.


> chisq.test(temp[,,1], correct=FALSE)
Pearson's Chi-squared test
data:  temp[, , 1] 
X-squared = 0.0058, df = 1, p-value = 0.9392
> temp[,,2]
scout
deliquent  no yes
no  132 104
yes  20  14
> chisq.test(temp[,,2], correct=FALSE)
Pearson's Chi-squared test
data:  temp[, , 2] 
X-squared = 0.101, df = 1, p-value = 0.7507
> temp[,,3]
scout
deliquent no yes
no  59 196
yes  2   8
> chisq.test(temp[,,3], correct=FALSE)
Pearson's Chi-squared test
data:  temp[, , 3] 
X-squared = 0.0534, df = 1, p-value = 0.

To test the conditional independence of (BS, DS) we can add these up to get the overall chi-squared statistic:

0.006 + 0.101 + 0.053 = 0.160.

Each of the individual tests has 1 degree of freedom, so the total number of degrees of freedom is 3. The p-value is \(P(\chi^2_3 \geq 0.1600)=0.984\), indicating that the conditional independence model fits extremely well. As a result, we will not reject this model here. However, the p-value is so high - doesn't it make you wonder what is going on here?

The apparent relationship between B and D can be explained by S; after the systematic differences in social class among scouts and non-scouts are accounted for, there is no additional evidence that scout membership has any effect on delinquency. The fact that the p-value is so close to 1 suggests that the model fit is too good to be true; it suggests that the data may have been fabricated. (It’s true; some of this data was created in order to illustrate this point!)

In the next section, we will see how to use the mantelhean.test in R.

Spurious Relationship

To see how the spurious relationship between B and D could have been induced, it is worthwhile to examine the B \(\times\) S and D \(\times\) S marginal tables.

The B \(\times\) S marginal table is shown below.

Socioeconomic status Boy scout
Yes No
Low 54 211
Medium 118 152
High 204 61

The test of independence for this table yields \(X^2 = 172.2\) with 2 degrees of freedom, which gives a p-value of essentially zero. There is a highly significant relationship between B and S. To see what the relationship is, we can estimate the conditional probabilities of B = 1 for S = 1, S = 2, and S = 3:

\(P(B=1|S=1)=54/(54 + 211) = .204\)

\(P(B=1|S=2)=118/(118 + 152) = .437\)

\(P(B=1|S=3)=204/(204 + 61) = .769\)

The probability of being a boy scout rises dramatically as socioeconomic status goes up.

Now let’s examine the D \(\times\) S marginal table.

Socioeconomic status Delinquent
Yes No
Low 53 212
Medium 34 236
High 10 255

The test for independence here yields \(X^2 = 32.8\) with 2 degrees of freedom, p-value \(\approx\) 0. The estimated conditional probabilities of D = 1 for S = 1, S = 2, and S = 3 are shown below.

\(P(D=1|S=1)=53/(53 + 212) = .200\)

\(P(D=1|S=2)=34/(34 + 236) = .126\)

\(P(D=1|S=3=10/(10 + 255) = .038\)

The rate of delinquency drops as socioeconomic status goes up. Now we see how S induces a spurious relationship between B and D. Boy scouts tend to be of higher social class than non-scouts, and boys in higher social class have a smaller chance of being delinquent. The apparent effect of scouting is really an effect of social class.

In the next section, we study how to test for conditional independence via the CMH statistic.