16.4 - Continuous Random Variables

What if we are interested in using a chi-square goodness-of-fit test to see if our data follow some continuous distribution? That is, what if we want to test:

\( H_0 : F(w) =F_0(w)\)

where \(F_0 (w)\) is some known, specified distribution. Clearly, in this situation, it is no longer obvious what constitutes each of the categories. Perhaps we could all agree that the logical thing to do would be to divide up the interval of possible values into k "buckets" or "categories," called \(A_1, A_2, \dots, A_k\), say, into which the observed data can fall. Letting \(Y_i\) denote the number of times the observed value of W belongs to bucket \(A_i, i = 1, 2, \dots, k\), the random variables \(Y_1, Y_2, \dots, Y_k\) follow a multinomial distribution with parameters \(n, p_1, p_2, \dots, p_{k−1}\). The hypothesis that we actually test is a modification of the null hypothesis above, namely:

\(H_{0}^{'} : p_i = p_{i0}, i=1, 2, \dots , k \)

The hypothesis is rejected if the observed value of the chi-square statistic:

\(Q_{k-1} =\sum_{i=1}^{k}\frac{(Obs_i - Exp_i)^2}{Exp_i}\)

is at least as great as \(\chi_{\alpha}^{2}(k-1)\). If the hypothesis \(H_{0}^{'} : p_i = p_{i0}, i=1, 2, \dots , k\) is not rejected, then we do not reject the original hypothesis \(H_0 : F(w) =F_0(w)\) .

Let's make this proposed procedure more concrete by taking a look at an example.

Example 16-7 Section

Bell Curve

The IQs of one-hundred randomly selected people were determined using the Stanford-Binet Intelligence Quotient Test. The resulting data were, in sorted order, as follows:

54 66 74 74 75 78 79 80 81 82
82 82 83 84 87 88 88 88 88 89
89 89 89 89 90 90 90 91 92 93
93 93 94 96 96 97 97 98 98 99
99 99 99 99 100 100 100 102 102 102
102 102 103 103 104 104 104 105 105 105
105 106 106 106 107 107 108 108 108 109
109 109 110 111 111 111 111 112 112 112
114 114 115 115 115 116 118 118 120 121
121 122 123 125 126 127 127 131 132 139

Test the null hypothesis that the data come from a normal distribution with a mean of 100 and a standard deviation of 16.

Answer

Hmm. So, where do we start? Well, we first have to define some categories. Let's divide up the interval of possible IQs into \(k = 10\) sets of equal probability \(\dfrac{1}{k} = \dfrac{1}{10}\). Perhaps this is best seen pictorially:

100-1.28-0.84-0.52x = 100-0.52 (16)x = 91.68x = 100-1.28 (16)x = 79.5x = 100-0.84 (16)x = 86.56ZX=IQ0.10.10.10.10.10.10.10.10.10.1

So, what's going on in this picture? Well, first the normal density is divided up into 10 intervals of equal probability (0.10). Well, okay, so the picture is not drawn very well to scale. At any rate, we then find the IQs that correspond to the \(k = 10\) cumulative probabilities of 0.1, 0.2, 0.3, etc. This is done in two steps:

  1. Step 1

    first by finding the Z-scores associated with the cumulative probabilities 0.1, 0.2, 0.3, etc.

  2. Step 2

    then by converting each Z-score into an X-value. It is those X-values (IQs) that will make up the "right-hand side" of each bucket:

    Category \(X\) Obs'd \(p_i = \left(e^{-5.6}5.6^x\right) / x!\) Exp'd
    1 0,1,2* 5 0.0824 8.24
    2 3 13 0.1082 10.82
    3 4 19 0.1515 15.15
    4 5 16 0.1697 16.97
    5 6 15 0.1584 15.84
    6 7 9 0.1267 12.67
    7 8 12 0.0887 8.87
    8 9 7 0.0552 5.52
    9 10,11,12* 4 0.0539 5.39
        \(n = 100\)   99.47
    1.  

      Category Class
      1 (\(-\infty\),79.5)
      2 (79.5, 86.5)
      3 (86.5, 91.6)
      4 (91.6, 95.9)
      5 (95.9, 100.0)
      6 (100.0, 104.1)
      7 (104.1, 108.4)
      8 (108.4, 113.5)
      9 (113.5, 120.5)
      10 (120.5, \(\infty\))
         
      Now, it's just a matter of counting the number of observations that fall into each bucket to get the observed (Obs'd) column, and calculating the expected number (0.10 × 100 = 10) to get the expected (Exp'd) column:
Category Class Obs'd Exp'd Contribution to \(Q\)
1 (\(-\infty\),79.5) 7 10 \(\left(7-10\right)^2 / 10 = 0.9\)
2 (79.5, 86.5) 7 10 \(\left(7-10\right)^2 / 10 = 0.9\)
3 (86.5, 91.6) 14 10 \(\left(14-10\right)^2 / 10 = 1.6\)
4 (91.6, 95.9) 5 10 \(\left(5-10\right)^2 / 10 = 2.5\)
5 (95.9, 100.0) 14 10 \(\left(14-10\right)^2 / 10 = 1.6\)
6 (100.0, 104.1) 10 10 \(\left(10-10\right)^2 / 10 = 0.0\)
7 (104.1, 108.4) 12 10 \(\left(12-10\right)^2 / 10 = 0.4\)
8 (108.4, 113.5) 11 10 \(\left(11-10\right)^2 / 10 = 0.1\)
9 (113.5, 120.5) 9 10 \(\left(9-10\right)^2 / 10 = 0.1\)
10 (120.5, \(\infty\)) 11 10 \(\left(11-10\right)^2 / 10 = 0.1\)
    \(n = 100\) \(n = 100\) \(Q_9 = 8.2\)

As illustrated in the table, using the observed and expected numbers, we see that the chi-square statistic is 8.2. We reject if the following is true:

\(Q_9 =8.2 \ge \chi_{10-1, 0.05}^{2} =\chi_{9, 0.05}^{2}=16.92\)

It isn't! We do not reject the null hypothesis at the 0.05 level. There is insufficient evidence to conclude that the data do not follow a normal distribution with a mean of 100 and a standard deviation 16.