2.3.7 - Chi-Square Approximation

Recall for large \(n\) that the chi-square distribution (\(\nu=1\)) may be used as an approximation to \(X\sim Bin(n,\pi)\):

\( \left(\dfrac{X-n\pi}{\sqrt{n\pi(1-\pi)}}\right)^2 \)

With a little algebraic manipulation, we can expand this into parts due to successes and failures:

\( \left(\dfrac{X-n\pi}{\sqrt{n\pi}}\right)^2 + \left(\dfrac{(n-X)-n(1-\pi)}{\sqrt{n(1-\pi)}}\right)^2\)

The benefit of writing it this way is to see how it can be generalized to the multinomial setting. That is, if \(X=(X_1,\ldots,X_k)\sim Mult(n,\pi)\), then

\(Q=\left(\dfrac{X_1-n\pi_1}{\sqrt{n\pi_1}}\right)^2 +\cdots+ \left(\dfrac{X_k-n\pi_k}{\sqrt{n\pi_k}}\right)^2\)

And \(Q\) has an approximate chi-square distribution with \(\nu=k-1\) degrees of freedom, provided the sample size is large. The usual condition to check for the sample size requirement is that all sample counts \(n\hat{\pi}_j\) are at least 5, although this is not a strict rule.