3.9 - Diagnostic Measures

Residuals

Recall that residuals tell how far off are the expected and observed values for each cell, under the assumed model. They tell us which cells drive the lack of fit. We can check for Pearson and standardized residuals calculated under the null model, just as we did for one-way tables.

Pearson Residual

The Pearson residual for a cell in a two-way table is

$r_{ij}=\dfrac{O_{ij}-E_{ij}}{\sqrt{E_{ij}}}$

where the chi-squared statistic then is: $X^2=\sum_j\sum_i r^2_{ij}$

$r_{ij}$’s have an approximate Normal distribution with mean 0, but their variances are not all equal! Typically their asymptotic variances are less than 1 and average variance equals $[(I − 1)(J − 1) / (\mbox{number of cells})]$.

Standardized (adjusted) Pearson Residual

The standardized (adjusted) Pearson residual for a cell in a two-way table is

$\dfrac{O_{ij}-E_{ij}}{\sqrt{[E_{ij}(1-p_{i+})(1-p_{+j})]}}$

A standardized Pearson residual has an approximate $N(0,1)$ distribution. A value that exceeds 2 or 3 in absolute value, therefore, suggests a lack of fit. For the heart disease example data, the residual in the $(2,1)$ cell is

$r_{12}=\dfrac{8-17.583}{17.583(1-\frac{92}{1329})(1-\frac{254}{1329})}=-2.63$

and would suggest some lack of fit of the independence model. It's also important to keep in mind, however, that the more cells involved, the more likely we are to observe an extreme residual by chance, even if the independence model holds.

In , SAS PROC FREQ the DEVIATION option gives the raw residuals (i.e., just the difference between the expected and observed values) and the CELLCHI2 option gives the squared Pearson residuals. Keep in mind that Pearson residuals are less variable than the standard normal variate; although notice that if the product of the marginal sample probabilities in the denominator is approximately equal to 1, that the adjusted Pearson residuals and the regular Pearson residuals are approximately equal.

The squared standardized Pearson residual values will have approximately chi-squared distribution with df = 1; thus at a critical alpha value 0.05, a value of the squared standardized Pearson residuals greater than 4 (i.e., $\chi^2(1, 0.05) = 3.84)$ will be considered significant (this can be used as a very crude cut-off for the squared Pearson residuals too). For other options in SAS explore the SAS documentation on PROC FREQ. For our example, see HeartDisease SAS Output(part of the output is below).

Here are the results from the Coronary Heart Disease example:

The SAS System

The FREQ Procedure

Frequency Expected Deviation Cell Chi-Square Percent Row Pct Col Pct


Table of CHD by serum
CHD	serum
CHD	0-199	200-199	220-259	260+	Total
chd	12 22.083 -10.08 4.6037 0.90 13.04 3.76	8 17.583 -9.583 5.223 0.60 8.70 3.15	31 32.536 -1.536 0.0725 2.33 33.70 6.60	41 19.798 21.202 22.704 3.09 44.57 14.34	92 6.92
nochd	307 296.92 10.083 0.3424 23.10 24.82 96.24	246 236.42 9.5831 0.3885 18.51 19.89 96.85	439 437.46 1.5357 0.0054 33.03 35.49 93.40	245 266.2 -21.2 1.6886 18.43 19.81 85.66	1237 93.08
Total	319 24.00	254 19.11	470 35.36	286 21.52	1329 100.00

Conclusion

Notice the values in the third row of the first, second and the fourth cell, e.g., 4.60, 5.22, 22.70. These are squared Pearson residuals and much larger than 3.84, and they seem to be driving the lack of independence. [As an exercise, compute the standardized Pearson residuals and see if your inference would change.]

Let's further investigate the dependence structures in this table.

In R, chisq.test(your data)\$residuals gives the Pearson residuals. In our Heart Disease example, see result\$residuals and the corresponding output in HeartDisease.out.

Notice that if the product of the marginal sample probabilities in the denominator is approximately equal to 1, the adjusted Pearson residuals and the regular Pearson residuals are approximately equal. The squared standardized Pearson residual values will have approximately chi-squared distribution with df = 1; thus at a critical alpha value 0.05, a value of the squared standardized Pearson residuals greater than 4 (i.e., $\chi^2(1, 0.05) = 3.84$) will be considered significant (this can be used as a very crude cut-off for the squared Pearson residuals too). A very crude cut-off for evaluating Pearson residuals, we can use the absolute value that exceeds 2 or 3. However, do keep in mind that Pearson residuals are less variable than the standard normal variate.

Partitioned Tests

Besides looking at the residuals or the measures of association, another way to describe the effects is to form a sequence of smaller tables by combining or collapsing rows and/or columns in a meaningful way---in other words, by looking into specific smaller tables within the larger table.

Partitioning chi-squared uses the fact that the sum of independent chi-squared statistics are themselves chi-squared statistics with degrees of freedom equal to the sum of the degrees of freedom for the individual statistics. The reason why this works is that the multinomial distribution may be collapsed into multinomials and partitioned into product-multinomials. We start with a chi-squared statistics with df > 1 and break it down into parts, such that each new statistic has df = 1. This partitioning helps to show that significant associations for the whole table are driven by differences between some subset of categories.

Typically (just as in odds ratios), for an $I \times J$ table, there will be $(I − 1) \times (J − 1)$ partitions. In our example, we have $(3-1)(2-1) = 2$ parts. Let’s combine the first two rows:

	Student smokes	Student doesn’t
1–2 parents smoke	816	3203
Neither parent smokes	188	1168

This table has $X^2 = 27.7$, $G^2 = 29.1$, $p$-value $\approx 0$, and $\hat{\theta}=1.58$. We estimate that a student is 58% more likely, on the odds scale, to smoke if he or she has at least one smoking parent.

We may now ask, "Among those students with at least one smoking parent, is there any difference between those with one smoking parent and those with two smoking parents?" Given that at least one parent smokes, is there any evidence that the other parent’s smoking affects the chances that the student will smoke?

To answer this, we discard the last row of the original table and look at the upper $2 \times 2$ sub-table.

	Student smokes	Student doesn’t
Both parents smoke	400	1380
One parent smokes	416	1823

This table has $X^2 = 9.3$, $G^2 = 9.2$, $p$-value $\approx .002$, and $\hat{\theta}=1.27$. Given that at least one parent smokes, the fact that the other parent smokes does indeed raise the student’s probability of smoking; the effect, however, is not as large ($\hat{\theta}=1.27$) as it was in going from neither parent smoking to at least one parent smoking ($\hat{\theta}=1.58$).

Notice what happens if we add up the G² values from these two $2 \times 2$ tables:

$29.1 + 9.2 = 38.3$

The result is very close to 38.4, the value of $G^2$ that we got for the full $3 \times 2$ table. In fact, these two numbers should have come out exactly the same; the difference in the last decimal place was merely due to a rounding error. It is possible to show by theoretical results that partitions of $G^2 $ do add up to the total $G^2$. However, that is not true for the Pearson $X^2$. The individual $X^2$ values do not add up exactly to the overall $X^2$, but they are pretty close:

$27.7 + 9.3 = 37.0 \approx 37.6$.

When we analyze a $3 \times 2$ table in this manner---by combining two rows into a single row, and then un-combining them again---we have partitioned the 2 degree-of-freedom test for independence into two single degree-of-freedom tests. By breaking up the test for independence into a sequence of tests for smaller tables, we can often identify precisely how the categorical variables may or may not be related. We compare these deviation statistics to investigate where the differences are coming from which could be very helpful when, for instance, you might be designing a future study and are exploring which parameters are important to include.

In practice, there are often many different ways to break up an $ I \times J$ table into a sequence of smaller tables. It is a good idea to do it in such a way that independence tests for each of the smaller tables pertain to a question that makes sense in the context of the individual problem, keeping the above rules in mind.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility