# 2.5 - Residuals

2.5 - ResidualsHow large is the discrepancy between the two proposed models? The previous analysis provides a summary of the overall difference between them, but if we want to know more specifically where these differences are coming from, cell-specific residuals can be inspected for relevant clues. There are two types of residuals we will consider: *Pearson* and *deviance residuals*.

## Pearson Residuals

##### Pearson Goodness-of-fit Test Statistic

The Pearson goodness-of-fit statistic can be written as \(X^2=\sum\limits_{j=1}^k r^2_j\) , where

\(r_j=\dfrac{X_j-n\pi_{0j}}{\sqrt{n\pi_{0j}}}\)

is called the **Pearson residua**l for cell \(j\), and it compares the observed with the expected counts. The sign (positive or negative) indicates whether the observed frequency in cell \(j\) is higher or lower than the value implied under the null model, and the magnitude indicates the degree of departure. When data do not fit the null model, examination of the Pearson residuals often helps to diagnose where the model has failed.

How large should a "typical" value of \(r_j\) be? Recall that the expectation of a \(\chi^2\) random variable is its degrees of freedom. Thus, if a model is correct, \(E(X^2) \approx E(G^2) \approx k − 1\), and the typical size of a single \(r_{j}^{2}\) is \(\dfrac{k − 1}{k}\). Thus, if the absolute value, \(|r_j|\), is much larger than \(\sqrt{(k-1)/k}\)—say, *2.0 or more*—then the model does not appear to fit well for cell \(j\).

## Deviance Residuals

Although not as intuitively as the \(X^2\) statistic, the deviance statistic \(G^2=\sum\limits_{j=1}^k d^2_j\) can be regarded as the sum of squared **deviance residuals**,

\(d_j=\sqrt{\left|2X_j\log\dfrac{X_j}{n\pi_{0j}}\right|}\times \text{sign}(X_j-n\pi_{0j})\)

The **sign** function can take three values:

- -1 if \((X_j - n\pi_{0j} ) < 0\),
- 0 if \((X_j- n\pi_{0j} ) = 0\), or
- 1 if \((X_j- n\pi_{0j}) > 0\).

When the expected counts* *\(n\pi_{0j}\) are all fairly large (much greater than 5) the deviance and Pearson residuals resemble each other quite closely.

## Example: Die Rolls continued

Below is a table of observed counts, expected counts, and residuals for the fair-die example; for calculations see dice_rolls.R. Unfortunately, the CELLCHI2 option in SAS that gives these residuals does NOT work for one-way tables; we will use it for higher-dimensional tables.

cell j | \(O_j\) | \(E_j\) | \(r_j\) | \(d_j\) |
---|---|---|---|---|

1 | 3 | 5 | -0.89 | -1.75 |

2 | 7 | 5 | +0.89 | 2.17 |

3 | 5 | 5 | +0.00 | 0.00 |

4 | 10 | 5 | +2.24 | 3.72 |

5 | 2 | 5 | -1.34 | -1.91 |

6 | 3 | 5 | +0.89 | -1.75 |

The only cell that seems to deviate substantially from the fair-die model is for \(j=4\). If the die is not fair, then it may be "loaded" in favor of the outcome 4. But recall that the \(p\)-value was about .10, so the evidence against fairness is not overwhelming.

## Effects of Zero Cell Counts

If an \(X_j\) is zero and all \(\pi_{0j}\)s are positive, then *the Pearson *\(X^2\) can be calculated without any problems, but there is a problem in computing the *deviance, *\(G^2\); if \(X_j = 0\) then the deviance residual is undefined, and if we use the standard formula,

\(G^2=2\sum\limits_{j=1}^k X_j\log\dfrac{X_j}{n\pi_{0j}}\)

an error will result. But if we write the deviance as

\(G^2=2\log\dfrac{L(\pi_0;X)}{L(\hat{\pi};X)}=2\log\prod\limits_{j=1}^k \left(\dfrac{X_j/n}{\pi_{0j}}\right)^{X_j}\)

Now, a cell with \(X_j=0\) contributes 1 to the product and may be ignored. Thus, we may calculate *the deviance statistic* as

\(G^2=2\sum\limits_{i:X_j>0} X_j\log\dfrac{X_j}{n\pi_{0j}}\)

Alternatively, we can set the deviance residuals to zero for cells with \(X_j=0\) and take \(G^2= \sum_j d_j^2\) as before. But if we do that, \(d_j = 0\) should not be interpreted as "the model fits well in cell \(j\)". The fit could be quite poor, especially if \(E_j\) is large.

**If any element of vector \(\pi_0\) is zero, then \(X^2\) and \(G^2\) both break down**. Simple ways to avoid such problems include putting a very small mass in all the cells, including the zero-cell(s) (e.g., \(0.05\)) or removing or combining cells and re-doing the tests.

#### Structural zeros and sampling zeros

Note that an observed zero in a cell does not necessarily mean that there **cannot** be any observation in that cell. A *sampling zero *is an observed zero count in a cell for which a positive value is possible. An example of this would have occurred in our die experiment if by chance the number 1 had not shown up in any of the 30 rolls. Alternatively, consider an example of categorizing male and female patients on whether they have carcinoma. Since only females can have ovarian cancer, and only males can have prostate cancer, the following cells are said to have *structural zeros*, and the underlying cell probabilities \(\pi_j=0\). To handle structural zeros, an adjustment to the degrees-of-freedom is required.

Patient | Ovarian Cancer | Prostate cancer |
---|---|---|

Male | 0 | 7 |

Female | 10 | 0 |

Total | 10 | 7 |