Continuous Random Variables

Printer-friendly versionPrinter-friendly version

What if we are interested in using a chi-square goodness-of-fit test to see if our data follow some continuous distribution? That is, what if we want to test:

\[ H_0 : F(w) =F_0(w)\]

where F0(w) is some known, specified distribution. Clearly, in this situation, it is no longer obvious as to what constitutes each of the categories. Perhaps we could all agree that the logical thing to do would be to divide up the interval of possible values into k "buckets" or "categories," called A1, A2, ..., Ak, say, into which the observed data can fall. Letting Yi denote the number of times the observed value of W belongs to bucket Ai, i = 1, 2, ..., k, the random variables Y1, Y2, ..., Yk follow a multinomial distribution with parameters n, p1, p2, ..., pk−1. The hypothesis that we actually test is a modification of the null hypothesis above, namely:

\[H_{0}^{'} : p_i = p_{i0}, i=1, 2, ... , k \]

The hypothesis is rejected if the observed value of the chi-square statistic:

\[Q_{k-1} =\sum_{i=1}^{k}\frac{(Obs_i - Exp_i)^2}{Exp_i}\]

is at least as great as \(\chi_{\alpha}^{2}(k-1)\). If the hypothesis \(H_{0}^{'} : p_i = p_{i0}, i=1, 2, ... , k\)  is not rejected, then we do not reject the original hypothesis \(H_0 : F(w) =F_0(w)\) .

Let's make this proposed procedure more concrete by taking a look at an example.

bell curveExample

The IQs of one-hundred randomly selected people were determined using the Stanford-Binet Intelligence Quotient Test. The resulting data were, in sorted order, as follows:

iqs

Test the null hypothesis that the data come from a normal distribution with mean 100 and standard deviation 16.

Solution.  Hmmmm. So, where do we start? Well, we first have to define some categories. Let's divide up the interval of possible IQs into k = 10 sets of equal probability 1/k = 1/10. Perhaps this is best seen pictorially:

drawing

So, what's going on in this picture? Well, first the normal density is divided up into 10 intervals of equal probability (0.10). Well, okay, so the picture is not drawn very well to scale. At any rate, we then find the IQs that correspond to the k = 10 cumulative probabilities 0.1, 0.2, 0.3, etc. This is done in two steps: (1) first by finding the Z-scores associated with the cumulative probabilities 0.1, 0.2, 0.3, etc. and (2) then by converting each Z-score into an X-value. It is those X-values (IQs) that will make up the "right-hand side" of each bucket:

table

Now, it's just a matter of counting the number of observations that fall into each bucket to get the observed (Obs'd) column, and calculating the expected number (0.10 × 100 = 10) to get the expected (Exp'd) column: 

table of counts


As illustrated in the table, using the observed and expected numbers, we see that the chi-square statistic is 8.2. We reject if the following is true: 

\[Q_9 =8.2 \ge \chi_{10-1, 0.05}^{2} =\chi_{9, 0.05}^{2}=16.92\]

It isn't! We do not reject the null hypothesis at the 0.05 level. There is insufficient evidence to conclude that the data do not follow a normal distribution with mean 100 and standard deviation 16.