5.3.5 - Cochran-Mantel-Haenszel Test
5.3.5 - Cochran-Mantel-Haenszel TestThis is another way to test for conditional independence, by exploring associations in partial tables for \(2 \times 2 \times K\) tables. Recall, the null hypothesis of conditional independence is equivalent to the statement that all conditional odds ratios given the levels \(k\) are equal to 1, i.e.,
\(H_0 : \theta_{XY(1)} = \theta_{XY(2)} = \cdots = \theta_{XY(K)} = 1\)
The Cochran-Mantel-Haenszel (CMH) test statistic is
\(M^2=\dfrac{[\sum_k(n_{11k}-\mu_{11k})]^2}{\sum_k Var(n_{11k})}\)
where \(\mu_{11k}=E(n_{11})=\frac{n_{1+k}n_{+1k}}{n_{++k}}\) is the expected frequency of the first cell in the \(k\)th partial table assuming the conditional independence holds, and the variance of cell (1, 1) is
\(Var(n_{11k})=\dfrac{n_{1+k}n_{2+k}n_{+1k}n_{+2k}}{n^2_{++k}(n_{++k}-1)}\).
Properties of the CMH statistic
- For large samples, when \(H_0\) is true, the CMH statistic has a chi-squared distribution with df = 1.
- If all \(\theta_{XY(k)} = 1\), then the CMH statistic is close to zero
- If some or all \(\theta_{XY(k)} > 1\), then the CMH statistic is large
- If some or all \(\theta_{XY(k)} < 1\), then the CMH statistic is large
- If some \(\theta_{XY(k)} < 1\) and others \(\theta_{XY(k)} > 1\), then the CMH statistic is not as effective; that is, the test works better if the conditional odds ratios are in the same direction and comparable in size.
- The CMH test can be generalized to \(I \times J \times K\) tables, but this generalization varies depending on the nature of the variables:
- the general association statistic treats both variables as nominal and thus has df \(= (I β1)\times(J β1)\).
- the row mean scores differ statistic treats the row variable as nominal and column variable as ordinal, and has df \(= I β 1\).
- the nonzero correlation statistic treats both variables as ordinal, and df = 1.
Common odds-ratio estimate
As we have seen before, itβs always informative to have a summary estimate of strength of association (rather than just a hypothesis test). If the associations are similar across the partial tables, we can summarize them with a single value: an estimate of the common odds ratio for a \(2 \times2 \times K\) table is
\(\hat{\theta}_{MH}=\dfrac{\sum_k(n_{11k}n_{22k})/n_{++k}}{\sum_k(n_{12k}n_{21k})/n_{++k}}\)
This is a useful summary statistic especially if the model of homogeneous associations holds, as we will see in the next section.
Example - Boy Scouts and Juvenile Delinquency
For the boy scout data based on the first method of doing individual chi-squared tests in each conditional table we concluded that B and D are independent given S. Here we repeat our analysis using the CMH test.
In the SAS program file boys.sas, the cmh option (e.g., tables SES*scouts*delinquent / chisq cmh) gives the following summary statistics output where the CMH statistics are:
Summary Statistics for scout by delinquent
Controlling for SES
Cochran-Mantel-Haenszel Statistics (Based on Table Scores) | ||||
---|---|---|---|---|
Statistic | Alternative Hypothesis | DF | Value | Prob |
1 | Nonzero Correlation | 1 | 0.0080 | 0.9287 |
2 | Row Mean Scores Differ | 1 | 0.0080 | 0.9287 |
3 | General Association | 1 | 0.0080 | 0.9287 |
The small value of the general association statistic, CMH = 0.0080 which is very close to zero indicates that conditional independence model is a good fit for this data; i.e., we cannot reject the null hypothesis.
The hypothesis of conditional independence is tenable, thus \(\theta_{BD(\text{high})} = \theta_{BD(\text{mid})} = \theta_{BD(\text{low})} = 1\), is also tenable. Below, we can see that the association can be summarized with the common odds ratio value of 0.978, with a 95% CI (0.597, 1.601).
Common Odds Ratio and Relative Risks | ||||
---|---|---|---|---|
Statistic | Method | Value | 95% Confidence Limits | |
Odds Ratio | Mantel-Haenszel | 0.9777 | 0.5970 | 1.6010 |
Logit | 0.9770 | 0.5959 | 1.6020 | |
Relative Risk (Column 1) | Mantel-Haenszel | 0.9974 | 0.9426 | 1.0553 |
Logit | 1.0015 | 0.9581 | 1.0468 | |
Relative Risk (Column 2) | Mantel-Haenszel | 1.0193 | 0.6706 | 1.5495 |
Logit | 1.0195 | 0.6712 | 1.5484 |
Since \(\theta_{BD(\text{high})} \approx \theta_{BD(\text{mid})} \approx \theta_{BD(\text{low})}\), the CMH is typically a more powerful statistic than the Pearson chi-squared statistic we calculated in the previous section, \(X^2 = 0.160\).
The option in R is mantelhaen.test() and used in the file boys.R as shown below:
#### Cochran-Mantel-Haenszel test
mantelhaen.test(temp)
mantelhaen.test(temp,correct=FALSE)
Here is the output:
Mantel-Haenszel chi-squared test without continuity correction
data: temp
Mantel-Haenszel X-squared = 0.0080042, df = 1, p-value = 0.9287
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.5970214 1.6009845
sample estimates:
common odds ratio
0.9776615
It gives the same value as SAS (e.g., Mantel-Haenszel \(X^2= 0.008\), df = 1, p-value = 0.9287), and it only computes the general association version of the CMH statistic which treats both variables as nominal, which is very close to zero and indicates that conditional independence model is a good fit for this data; i.e., we cannot reject the null hypothesis.
The hypothesis of conditional independence is tenable, thus \(\theta_{BD(\text{high})} = \theta_{BD(\text{mid})} = \theta_{BD(\text{low})} = 1\), is also tenable. Above, we can see that the association can be summarized with the common odds ratio value of 0.978, with a 95% CI (0.597, 1.601).
Since \(\theta_{BD(\text{high})} \approx \theta_{BD(\text{mid})} \approx \theta_{BD(\text{low})}\), the CMH is typically a more powerful statistic than the Pearson chi-squared statistic we calculated in the previous section, \(X^2 = 0.160\).