3.9  Diagnostic Measures
3.9  Diagnostic MeasuresResiduals
Recall that residuals tell how far off are the expected and observed values for each cell, under the assumed model. They tell us which cells drive the lack of fit. We can check for Pearson and standardized residuals calculated under the null model, just as we did for oneway tables.
 Pearson Residual

The Pearson residual for a cell in a twoway table is
\(r_{ij}=\dfrac{O_{ij}E_{ij}}{\sqrt{E_{ij}}}\)
where the chisquared statistic then is: \(X^2=\sum_j\sum_i r^2_{ij}\)
\(r_{ij}\)’s have an approximate Normal distribution with mean 0, but their variances are not all equal! Typically their asymptotic variances are less than 1 and average variance equals \([(I − 1)(J − 1) / (\mbox{number of cells})]\).
 Standardized (adjusted) Pearson Residual

The standardized (adjusted) Pearson residual for a cell in a twoway table is
\(\dfrac{O_{ij}E_{ij}}{\sqrt{[E_{ij}(1p_{i+})(1p_{+j})]}}\)
A standardized Pearson residual has an approximate \(N(0,1)\) distribution. A value that exceeds 2 or 3 in absolute value, therefore, suggests a lack of fit. For the heart disease example data, the residual in the \((2,1)\) cell is
\(r_{12}=\dfrac{817.583}{17.583(1\frac{92}{1329})(1\frac{254}{1329})}=2.63\)
and would suggest some lack of fit of the independence model. It's also important to keep in mind, however, that the more cells involved, the more likely we are to observe an extreme residual by chance, even if the independence model holds.
In , SAS PROC FREQ the DEVIATION option gives the raw residuals (i.e., just the difference between the expected and observed values) and the CELLCHI2 option gives the squared Pearson residuals. Keep in mind that Pearson residuals are less variable than the standard normal variate; although notice that if the product of the marginal sample probabilities in the denominator is approximately equal to 1, that the adjusted Pearson residuals and the regular Pearson residuals are approximately equal.
The squared standardized Pearson residual values will have approximately chisquared distribution with df = 1; thus at a critical alpha value 0.05, a value of the squared standardized Pearson residuals greater than 4 (i.e., \(\chi^2(1, 0.05) = 3.84)\) will be considered significant (this can be used as a very crude cutoff for the squared Pearson residuals too). For other options in SAS explore the SAS documentation on PROC FREQ. For our example, see HeartDisease SAS Output(part of the output is below).
Here are the results from the Coronary Heart Disease example:
The FREQ Procedure


Conclusion
Notice the values in the third row of the first, second and the fourth cell, e.g., 4.60, 5.22, 22.70. These are squared Pearson residuals and much larger than 3.84, and they seem to be driving the lack of independence. [As an exercise, compute the standardized Pearson residuals and see if your inference would change.]
Let's further investigate the dependence structures in this table.
In R, chisq.test(your data)\$residuals gives the Pearson residuals. In our Heart Disease example, see result\$residuals and the corresponding output in HeartDisease.out.
Notice that if the product of the marginal sample probabilities in the denominator is approximately equal to 1, the adjusted Pearson residuals and the regular Pearson residuals are approximately equal. The squared standardized Pearson residual values will have approximately chisquared distribution with df = 1; thus at a critical alpha value 0.05, a value of the squared standardized Pearson residuals greater than 4 (i.e., \(\chi^2(1, 0.05) = 3.84\)) will be considered significant (this can be used as a very crude cutoff for the squared Pearson residuals too). A very crude cutoff for evaluating Pearson residuals, we can use the absolute value that exceeds 2 or 3. However, do keep in mind that Pearson residuals are less variable than the standard normal variate.
Partitioned Tests
Besides looking at the residuals or the measures of association, another way to describe the effects is to form a sequence of smaller tables by combining or collapsing rows and/or columns in a meaningful wayin other words, by looking into specific smaller tables within the larger table.
Partitioning chisquared uses the fact that the sum of independent chisquared statistics are themselves chisquared statistics with degrees of freedom equal to the sum of the degrees of freedom for the individual statistics. The reason why this works is that the multinomial distribution may be collapsed into multinomials and partitioned into productmultinomials. We start with a chisquared statistics with df > 1 and break it down into parts, such that each new statistic has df = 1. This partitioning helps to show that significant associations for the whole table are driven by differences between some subset of categories.
Typically (just as in odds ratios), for an \(I \times J\) table, there will be \((I − 1) \times (J − 1)\) partitions. In our example, we have \((31)(21) = 2\) parts. Let’s combine the first two rows:
Student smokes  Student doesn’t  

1–2 parents smoke  816  3203 
Neither parent smokes  188  1168 
This table has \(X^2 = 27.7\), \(G^2 = 29.1\), \(p\)value \(\approx 0\), and \(\hat{\theta}=1.58\). We estimate that a student is 58% more likely, on the odds scale, to smoke if he or she has at least one smoking parent.
We may now ask, "Among those students with at least one smoking parent, is there any difference between those with one smoking parent and those with two smoking parents?" Given that at least one parent smokes, is there any evidence that the other parent’s smoking affects the chances that the student will smoke?
To answer this, we discard the last row of the original table and look at the upper \(2 \times 2\) subtable.
Student smokes  Student doesn’t  

Both parents smoke  400  1380 
One parent smokes  416  1823 
This table has \(X^2 = 9.3\), \(G^2 = 9.2\), \(p\)value \(\approx .002\), and \(\hat{\theta}=1.27\). Given that at least one parent smokes, the fact that the other parent smokes does indeed raise the student’s probability of smoking; the effect, however, is not as large (\(\hat{\theta}=1.27\)) as it was in going from neither parent smoking to at least one parent smoking (\(\hat{\theta}=1.58\)).
Notice what happens if we add up the G^{2} values from these two \(2 \times 2\) tables:
\(29.1 + 9.2 = 38.3\)
The result is very close to 38.4, the value of \(G^2\) that we got for the full \(3 \times 2\) table. In fact, these two numbers should have come out exactly the same; the difference in the last decimal place was merely due to a rounding error. It is possible to show by theoretical results that partitions of \(G^2 \) do add up to the total \(G^2\). However, that is not true for the Pearson \(X^2\). The individual \(X^2\) values do not add up exactly to the overall \(X^2\), but they are pretty close:
\(27.7 + 9.3 = 37.0 \approx 37.6\).
When we analyze a \(3 \times 2\) table in this mannerby combining two rows into a single row, and then uncombining them againwe have partitioned the 2 degreeoffreedom test for independence into two single degreeoffreedom tests. By breaking up the test for independence into a sequence of tests for smaller tables, we can often identify precisely how the categorical variables may or may not be related. We compare these deviation statistics to investigate where the differences are coming from which could be very helpful when, for instance, you might be designing a future study and are exploring which parameters are important to include.
In practice, there are often many different ways to break up an \( I \times J\) table into a sequence of smaller tables. It is a good idea to do it in such a way that independence tests for each of the smaller tables pertain to a question that makes sense in the context of the individual problem, keeping the above rules in mind.