Lesson 16: ChiSquare GoodnessofFit Tests
Lesson 16: ChiSquare GoodnessofFit TestsSuppose the Penn State student population is 60% female and 40%, male. Then, if a sample of 100 students yields 53 females and 47 males, can we conclude that the sample is (random and) representative of the population? That is, how "good" do the data "fit" the probability model? As the title of the lesson suggests, that's the kind of question that we will answer in this lesson.
16.1  The General Approach
16.1  The General ApproachExample 161
As is often the case, we'll motivate the methods of this lesson by way of example. Specifically, we'll return to the question posed in the introduction to this lesson.
Suppose the Penn State student population is 60% female and 40%, male. Then, if a sample of 100 students yields 53 females and 47 males, can we conclude that the sample is (random and) representative of the population? That is, how "good" do the data "fit" the assumed probability model of 60% female and 40% male?
Answer
Testing whether there is a "good fit" between the observed data and the assumed probability model amounts to testing:
\(H_0 : p_F =0.60\)
\(H_A : p_F \ne 0.60\)
Now, letting \(Y_1\) denote the number of females selected, we know that \(Y_1\) follows a binomial distribution with n trials and probability of success \(p_1\). That is:
\(Y_1 \sim b(n,p_1)\)
Therefore, the expected value and variance of \(Y_1\) are, respectively:
\(E(Y_1)=np_1\) and \(Var (Y_1) =np_1(1p_1)\)
And, letting \(Y_2\) denote the number of males selected, we know that \(Y_2 = n − Y_1\) follows a binomial distribution with n trials and probability of success \(p_2\). That is:
\(Y_2 = nY_1 \sim (b(n,p_2)=b(n,1p_1)\)
Therefore, the expected value and variance of \(Y_2\) are, respectively:
\(E(Y_2)=n(1p_1)=np_2\) and \(Var(Y_2)=n(1p_1)(1(1p_1))=np_1(1p_1)=np_2(1p_2)\)
Now, for large samples (\(np_1 ≥ 5\) and \(n(1 − p_1) ≥ 5)\), the Central Limit Theorem yields the normal approximation to the binomial distribution. That is:
\(Z=\dfrac{(Y_1np_1)}{\sqrt{np_1(1p_1)}}\)
follows, at least approximately, the standard normal N(0,1) distribution. Therefore, upon squaring Z, we get that:
\(Z^2=Q_1=\dfrac{(Y_1np_1)^2}{np_1(1p_1)}\)
follows an approximate chisquare distribution with one degree of freedom. Now, we could stop there. But, that's not typically what is done. Instead, we can rewrite \(Q_1\) a bit. Let's start by multiplying \(Q_1\) by 1 in a special way, that is, by multiplying it by \(\left( \left( 1 − p_1 \right) + p_1 \right)\):
\(Q_1=\dfrac{(Y_1np_1)^2}{np_1(1p_1)} \times ((1p_1)+p_1)\)
Then, distributing the "1" across the numerator, we get:
\(Q_1=\dfrac{(Y_1np_1)^2(1p_1)}{np_1(1p_1)}+\dfrac{(Y_1np_1)^2p_1}{np_1(1p_1)}\)
which simplies to:
\(Q_1=\dfrac{(Y_1np_1)^2}{np_1}+\dfrac{(Y_1np_1)^2}{n(1p_1)}\)
Now, taking advantage of the fact that \(Y_1 = n − Y_2\) and \(p_1 = 1 − p_2\), we get:
\(Q_1=\dfrac{(Y_1np_1)^2}{np_1}+\dfrac{(nY_2n(1p_2))^2}{np_2}\)
which simplifies to:
\(Q_1=\dfrac{(Y_1np_1)^2}{np_1}+\dfrac{((Y_2np_2))^2}{np_2}\)
Just one more thing to simplify before we're done:
\(Q_1=\dfrac{(Y_1np_1)^2}{np_1}+\dfrac{(Y_2np_2)^2}{np_2}\)
In summary, we have rewritten \(Q_1\) as:
\( Q_1=\sum_{i=1}^{2}\dfrac{(Y_inp_i)^2}{np_i}=\sum_{i=1}^{2}\dfrac{(\text{OBSERVED }\text{ EXPECTED})^2}{EXPECTED} \)
We'll use this form of \(Q_1\), and the fact that \(Q_1\) follows an approximate chisquare distribution with one degree of freedom, to conduct the desired hypothesis test.
Before we return to solving the problem posed by our example, a couple of points are worthy of emphasis.
 First, \(Q_1\) has only one degree of freedom, since there is only one independent count, namely \(Y_1\). Once \(Y_1\) is known, the value of \(Y_2 = n − Y_1\) immediately follows.
 Note that the derived approach requires the Central Limit Theorem to kick in. The general rule of thumb is that the expected number of successes must be at least 5 (that is, \(np_1 ≥ 5\)) and the expected number of failures must be at least 5 (that is, \(n(1−p_1) ≥ 5\)).
 The statistic \(Q_1\) will be large if the observed counts are very different from the expected counts. Therefore, we must reject the null hypothesis \(H_0\) if \(Q_1\) is large. How large is large? Large is determined by the values of a chisquare random variable with one degree of freedom, which can be obtained either from a statistical software package, such as Minitab or SAS or from a standard chisquare table, such as the one in the back of our textbook.
 The statistic \(Q_1\) is called the chisquare goodnessoffit statistic.
Example 162
Suppose the Penn State student population is 60% female and 40%, male. Then, if a sample of 100 students yields 53 females and 47 males, can we conclude that the sample is (random and) representative of the population? Use the chisquare goodnessoffit statistic to test the hypotheses:
\(H_0 : p_F =0.60\)
\(H_A : p_F \ne 0.60\)
using a significance level of \(\alpha = 0.05\).
Answer
The value of the test statistic \(Q_1\) is:
\(Q_1=\dfrac{(5360)^2}{60}+\dfrac{(4740)^2}{40}=2.04\)
We should reject the null hypothesis if the observed number of counts is very different from the expected number of counts, that is if \(Q_1\) is large. Because \(Q_1\) follows a chisquare distribution with one degree of freedom, we should reject the null hypothesis, at the 0.05 level, if:
\(Q_1 \ge \chi_{0.05, 1}^{2}=3.84\)
Because:
\(Q_1=2.04 < 3.84\)
we do not reject the null hypothesis. There is not enough evidence at the 0.05 level to conclude that the data don't fit the assumed probability model.
As an aside, it is interesting to note the relationship between using the chisquare goodness of fit statistic \(Q_1\) and the Zstatistic we've previously used for testing a null hypothesis about a proportion. In this case:
\( Z=\dfrac{0.530.60}{\sqrt{\frac{(0.60)(0.40)}{100}}}=1.428 \)
which, you might want to note that, if we square it, we get the same value as we did for \(Q_1\):
\(Q_1=Z^2 =(1.428)^2=2.04\)
as we should expect. The Ztest for a proportion tells us that we should reject if:
\(Z \ge 1.96 \)
Well, again, if we square it, we should see that that's equivalent to rejecting if:
\(Q_1 \ge (1.96)^2 =3.84\)
And not surprisingly, the Pvalues obtained from the two approaches are identical. The Pvalue for the chisquare goodnessoffit test is:
\(P=P(\chi_{(1)}^{2} > 2.04)=0.1532\)
while the Pvalue for the Ztest is:
\(P=2 \times P(Z>1.428)=2(0.0766)=0.1532\)
Identical, as we should expect!
16.2  Extension to K Categories
16.2  Extension to K CategoriesThe work on the previous page is all well and good if your probability model involves just two categories, which as we have seen, reduces to conducting a test for one proportion. What happens if our probability model involves three or more categories? It takes some theoretical work beyond the scope of this course to show it, but the chisquare statistic that we derived on the previous page can be extended to accommodate any number of k categories.
The Extension
Suppose an experiment can result in any of k mutually exclusive and exhaustive outcomes, say \(A_1, A_2, \dots, A_k\). If the experiment is repeated n independent times, and we let \(p_i = P(A_i)\) and \(Y_i\) = the number of times the experiment results in \(A_i, i = 1, \dots, k\), then we can summarize the number of observed outcomes and the number of expected outcomes for each of the k categories in a table as follows:
Categories  1  2  . . .  \(k  1\)  \(k\) 

Observed  \(Y_1\)  \(Y_2\)  . . .  \(Y_{k  1}\)  \(n  Y_1  Y_2  . . .  Y_{k  1}\) 
Expected  \(np_1\)  \(np_2\)  . . .  \(np_{k  1}\)  \(np_k\) 
Karl Pearson showed that the chisquare statistic Q_k−1 defined as:
\[Q_{k1}=\sum_{i=1}^{k}\frac{(Y_i  np_i)^2}{np_i} \]
follows approximately a chisquare random variable with k−1 degrees of freedom. Let's try it out on an example.
Example 163
A particular brand of candycoated chocolate comes in five different colors that we shall denote as:
 \(A_1 = \text{{brown}}\)
 \(A_2 = \text{{yellow}}\)
 \(A_3 = \text{{orange}}\)
 \(A_4 = \text{{green}}\)
 \(A_5 = \text{{coffee}}\)
Let \(p_i\) equal the probability that the color of a piece of candy selected at random belongs to \(A_i\), for \(i = 1, 2 3, 4, 5\). Test the following null and alternative hypotheses:
\(H_0 : p_{Br}=0.4,p_{Y}=0.2,p_{O}=0.2,p_{G}=0.1,p_{C}=0.1 \)
\(H_A : p_{i} \text{ not specified in null (many possible alternatives) } \)
using a random sample of n = 580 pieces of candy whose colors yielded the respective frequencies 224, 119, 130, 48, and 59. (This example comes from exercises 8.12 in the Hogg and Tanis (8th edition) textbook).
Answer
We can summarize the observed \((y_i)\) and expected \((np_i)\) counts in a table as follows:
Categories  Brown  Yellow  Orange  Green  Coffee  Total 

Observed \(y_i\)  224  119  130  48  59  580 
Assumed \(H_0 (p_i)\)  0.4  0.2  0.2  0.1  0.1  1.0 
Expected \(np_i\)  232  116  116  58  58  580 
where, for example, the expected number of brown candies is:
\(np_1 = 580(0.40) = 232\)
and the expected number of green candies is:
\(np_4 = 580(0.10) = 58\)
Once we have the observed and expected number of counts, the calculation of the chisquare statistic is straightforward. It is:
\(Q_4=\dfrac{(224232)^2}{232}+d\frac{(119116)^2}{116}+\dfrac{(130116)^2}{116}+\dfrac{(4858)^2}{58}+\dfrac{(5958)^2}{58} \)
Simplifying, we get:
\(Q_4=\dfrac{64}{232}+\dfrac{9}{116}+\dfrac{196}{116}+\dfrac{100}{58}+\dfrac{1}{58}=3.784 \)
Because there are k = 5 categories, we have to compare our chisquare statistic \(Q_4\) to a chisquare distribution with k−1 = 5−1 = 4 degrees of freedom:
\(\text{Reject }H_0 \text{ if } Q_4\ge \chi_{4,0.05}^{2}=9.488\)
Because \(Q_4 = 3.784 < 9.488\), we fail to reject the null hypothesis. There is insufficient evidence at the 0.05 level to conclude that the distribution of the color of the candies differs from that specified in the null hypothesis.
By the way, this might be a good time to think about the practical meaning of the term "degrees of freedom." Recalling the example on the last page, we had two categories (male and female) and one degree of freedom. If we are sampling n = 100 people and 53 of them are female, then we absolutely must have 100−53 = 47 males. If we had instead 62 females, then we absolutely must have 100−62 = 38 males. That is, the number of females is the one number that is "free" to be any number, but once it is determined, then the number of males immediately follows. It is in this sense that we have "one degree of freedom."
With the example on this page, we have five categories of candies (brown, yellow, orange, green, coffee) and four degrees of freedom. If we are sampling n = 580 candies, and 224 are brown, 119 are yellow, 130 are orange, and 48 are green, then we absolutely must have 580−(224+119+130+48) = 59 coffeecolored candies. In this case, we have four numbers that are "free" to be any number, but once they are determined, then the number of coffeecolored candies immediately follows. It is in this sense that we have "four degrees of freedom."
16.3  Unspecified Probabilities
16.3  Unspecified ProbabilitiesFor the two examples that we've thus far considered, the probabilities were prespecified. For the first example, we were interested in seeing if the data fit a probability model in which there was a 0.60 probability that a randomly selected Penn State student was female. In the second example, we were interested in seeing if the data fit a probability model in which the probabilities of selecting a brown, yellow, orange, green, and coffeecolored candy was 0.4, 0.2, 0.2, 0.1, and 0.1, respectively. That is, we were interested in testing specific probabilities:
\(H_0 : p_{B}=0.40,p_{Y}=0.20,p_{O}=0.20,p_{G}=0.10,p_{C}=0.10 \)
Someone might be also interested in testing whether a data set follows a specific probability distribution, such as:
\(H_0 : X \sim b(n, 1/2)\)
What if the probabilities aren't prespecified though? That is, suppose someone is interested in testing whether a random variable is binomial, but with an unspecified probability of success:
\(H_0 : X \sim b(n, p)\)
Can we still use the chisquare goodnessoffit statistic? The short answer is yes... with just a minor modification.
Example 164
Let X denote the number of heads when four dimes are tossed at random. One hundred repetitions of this experiment resulted in 0, 1, 2, 3, and 4 heads being observed on 8, 17, 41, 30, and 4 trials, respectively. Under the assumption that the four dimes are independent, and the probability of getting a head on each coin is p, the random variable X is b(4, p). In light of the observed data, is b(4, p) a reasonable model for the distribution of X?
Answer
In order to use the chisquare statistic to test the data, we need to be able to determine the observed and expected number of trials in which we'd get 0, 1, 2, 3, and 4 heads. The observed part is easy... we know those:
X  0  1  2  3  4 

Observed  8  17  41  30  4 
Expected 
It's the expected numbers that are a problem. If the probability p of getting ahead were specified, we'd be able to calculate the expected numbers. Suppose, for example, that p = 1/2. Then, the probability of getting zero heads in four dimes is:
\(P(X=0)=\binom{4}{0}\left(\dfrac{1}{2}\right)^0\left(\dfrac{1}{2}\right)^4=0.0625 \)
and therefore the expected number of trials resulting in 0 heads is 100 × 0.0625 = 6.25. We could make similar calculations for the case of 1, 2, 3, and 4 heads, and we would be well on our way to using the chisquare statistic:
\(Q_4=\sum_{i=0}^{4}\dfrac{(Obs_iExp_i)^2}{Exp_i} \)
and comparing it to a chisquare distribution with 5−1 = 4 degrees of freedom. But, we don't know p, as it is unspecified! What do you think the logical thing would be to do in this case? Sure... we'd probably want to estimate p. But then that begs the question... what should we use as an estimate of p?
One way of estimating p would be to minimize the chisquare statistic \(Q_4\) with respect to p, yielding an estimator \(\tilde{p}\). This \(\tilde{p}\) estimator is called, perhaps not surprisingly, a minimum chisquare estimator of p. If \(\tilde{p}\) is used in calculating the expected numbers that appear in \(Q_4\), it can be shown (not easily, and therefore we won't!) that \(Q_4\) still has an approximate chisquare distribution but with only 4−1 = 3 degrees of freedom. The number of degrees of freedom of the approximating chisquare distribution is reduced by one because we have to estimate one parameter in order to calculate the chisquare statistic. In general, the number of degrees of freedom of the approximating chisquare distribution is reduced by d, the number of parameters estimated. If we estimate two parameters, we reduce the degrees of freedom by two. And so on.
This all seems simple enough. There's just one problem... it is usually very difficult to find minimum chisquare estimators. So what to do? Well, most statisticians just use some other reasonable method of estimating the unspecified parameters, such as maximum likelihood estimation. The good news is that the chisquare statistic testing method still works well. (It should be noted, however, that the approach does provide a slightly larger probability of rejecting the null hypothesis than would the approach based purely on the minimized chisquare.)
Let's summarize
Chisquare method when parameters are unspecified. If you are interested in testing whether a data set fits a probability model with d parameters left unspecified:
 Estimate the d parameters using the maximum likelihood method (or another reasonable method).
 Calculate the chisquare statistic \(Q_{k−1}\) using the obtained estimates.
 Compare the chisquare statistic to a chisquare distribution with (k−1)−d degrees of freedom.
Example 165
Let X denote the number of heads when four dimes are tossed at random. One hundred repetitions of this experiment resulted in 0, 1, 2, 3, and 4 heads being observed on 8, 17, 41, 30, and 4 trials, respectively. Under the assumption that the four dimes are independent, and the probability of getting a head on each coin is p, the random variable X is b(4, p). In light of the observed data, is b(4, p) a reasonable model for the distribution of X?
Answer
Given that four dimes are tossed 100 times, we have 400 coin tosses resulting in 205 heads for an estimated probability of success of 0.5125:
\(\hat{p}=\dfrac{0(8)+1(17)+2(41)+3(30)+4(4)}{400}=\dfrac{205}{400}=0.5125 \)
Using 0.5125 as the estimate of p, we can use the binomial p.m.f. (or Minitab!) to calculate the probability that X = 0, 1, ..., 4:
x  P ( X = x ) 

0  0.056480 
1  0.237508 
2  0.374531 
3  0.262492 
4  0.068988 
and then, using the probabilities, the expected number of trials resulting in 0, 1, 2, 3, and 4 heads:
X  0  1  2  3  4 

Observed  8  17  41  30  4 
\(P(X = i)\)  0.0565  0.2375  0.3745  0.2625  0.0690 
Expected  5.65  23.75  37.45  26.25  6.90 
Calculating the chisquare statistic, we get:
\(Q_4=\dfrac{(85.65)^2}{5.65}+\dfrac{(1723.75)^2}{23.75}+ ... + \dfrac{(46.90)^2}{6.90} =4.99\)
We estimated the d = 1 parameter in calculating the chisquare statistic. Therefore, we compare the statistic to a chisquare distribution with (5−1)−1 = 3 degrees of freedom. Doing so:
\(Q_4= 4.99 < \chi_{3,0.05}^{2}=7.815\)
we fail to reject the null hypothesis. There is insufficient evidence at the 0.05 level to conclude that the data don't fit a binomial probability model.
Let's take a look at another example.
Example 166
Let X equal the number of alpha particles emitted from barium133 in 0.1 seconds and counted by a Geiger counter. One hundred observations of X produced these data.
It is claimed that X follows a Poisson distribution. Use a chisquare goodnessoffit statistic to test whether this is true.
Answer
Note that very few observations resulted in 0, 1, or 2 alpha particles being emitted in 0.1 second. And, very few observations resulted in 10, 11, or 12 alpha particles being emitted in 0.1 second. Therefore, let's "collapse" the data at the two ends, yielding us nine "notsosparse" categories:
Category  \(X\)  Obs'd 

1  0,1,2*  5 
2  3  13 
3  4  19 
4  5  16 
5  6  15 
6  7  9 
7  8  12 
8  9  7 
9  10,11,12*  4 
\(n = 100\) 
Because \(\lambda\), the mean of X, is not specified, we can estimate it with its maximum likelihood estimator, namely, the sample mean. Using the data, we get:
\(\bar{x}=\dfrac{1(1)+2(4)+3(13)+ ... + 12(1)}{100}=\dfrac{559}{100}=5.6\)
We can now estimate the probability that an observation will fall into each of the categories. The probability of falling into category 1, for example, is:
\(P(\{1\})=P(X=0)+P(X=1)+P(X=2) =\dfrac{e^{5.6}5.6^0}{0!}+\dfrac{e^{5.6}5.6^1}{1!}+\dfrac{e^{5.6}5.6^2}{2!}=0.0824 \)
Here's what our table looks like now, after adding a column containing the estimated probabilities:
Category  \(X\)  Obs'd  \(p_i = \left(e^{5.6}5.6^x\right) / x!\) 

1  0,1,2*  5  0.0824 
2  3  13  0.1082 
3  4  19  0.1515 
4  5  16  0.1697 
5  6  15  0.1584 
6  7  9  0.1267 
7  8  12  0.0887 
8  9  7  0.0552 
9  10,11,12*  4  0.0539 
\(n = 100\) 
Now, we just have to add a column containing the expected number falling into each category. The expected number falling into category 1, for example, is 0.0824 × 100 = 8.24. Doing a similar calculation for each of the categories, we can add our column of expected numbers:
Category  \(X\)  Obs'd  \(p_i = \left(e^{5.6}5.6^x\right) / x!\)  Exp'd 

1  0,1,2*  5  0.0824  8.24 
2  3  13  0.1082  10.82 
3  4  19  0.1515  15.15 
4  5  16  0.1697  16.97 
5  6  15  0.1584  15.84 
6  7  9  0.1267  12.67 
7  8  12  0.0887  8.87 
8  9  7  0.0552  5.52 
9  10,11,12*  4  0.0539  5.39 
\(n = 100\)  99.47 
Now, we can use the observed numbers and the expected numbers to calculate our chisquare test statistic. Doing so, we get:
\(Q_{91}=\dfrac{(58.24)^2}{8.24}+\dfrac{(1310.82)^2}{10.82}+ ... +\dfrac{(45.39)^2}{5.39}=5.7157 \)
Because we estimated d = 1 parameter, we need to compare our chisquare statistic to a chisquare distribution with (9−1)−1 = 7 degrees of freedom. That is, our critical region is defined as:
\(\text{Reject } H_0 \text{ if } Q_8 \ge \chi_{81, 0.05}^{2}=\chi_{7, 0.05}^{2}=14.07 \)
Because our test statistic doesn't fall in the rejection region, that is:
\(Q_8=5.77157 < \chi_{7, 0.05}^{2}=14.07\)
we fail to reject the null hypothesis. There is insufficient evidence at the 0.05 level to conclude that the data don't fit a Poisson probability model.
16.4  Continuous Random Variables
16.4  Continuous Random VariablesWhat if we are interested in using a chisquare goodnessoffit test to see if our data follow some continuous distribution? That is, what if we want to test:
\( H_0 : F(w) =F_0(w)\)
where \(F_0 (w)\) is some known, specified distribution. Clearly, in this situation, it is no longer obvious what constitutes each of the categories. Perhaps we could all agree that the logical thing to do would be to divide up the interval of possible values into k "buckets" or "categories," called \(A_1, A_2, \dots, A_k\), say, into which the observed data can fall. Letting \(Y_i\) denote the number of times the observed value of W belongs to bucket \(A_i, i = 1, 2, \dots, k\), the random variables \(Y_1, Y_2, \dots, Y_k\) follow a multinomial distribution with parameters \(n, p_1, p_2, \dots, p_{k−1}\). The hypothesis that we actually test is a modification of the null hypothesis above, namely:
\(H_{0}^{'} : p_i = p_{i0}, i=1, 2, \dots , k \)
The hypothesis is rejected if the observed value of the chisquare statistic:
\(Q_{k1} =\sum_{i=1}^{k}\frac{(Obs_i  Exp_i)^2}{Exp_i}\)
is at least as great as \(\chi_{\alpha}^{2}(k1)\). If the hypothesis \(H_{0}^{'} : p_i = p_{i0}, i=1, 2, \dots , k\) is not rejected, then we do not reject the original hypothesis \(H_0 : F(w) =F_0(w)\) .
Let's make this proposed procedure more concrete by taking a look at an example.
Example 167
The IQs of onehundred randomly selected people were determined using the StanfordBinet Intelligence Quotient Test. The resulting data were, in sorted order, as follows:
54  66  74  74  75  78  79  80  81  82 

82  82  83  84  87  88  88  88  88  89 
89  89  89  89  90  90  90  91  92  93 
93  93  94  96  96  97  97  98  98  99 
99  99  99  99  100  100  100  102  102  102 
102  102  103  103  104  104  104  105  105  105 
105  106  106  106  107  107  108  108  108  109 
109  109  110  111  111  111  111  112  112  112 
114  114  115  115  115  116  118  118  120  121 
121  122  123  125  126  127  127  131  132  139 
Test the null hypothesis that the data come from a normal distribution with a mean of 100 and a standard deviation of 16.
Answer
Hmm. So, where do we start? Well, we first have to define some categories. Let's divide up the interval of possible IQs into \(k = 10\) sets of equal probability \(\dfrac{1}{k} = \dfrac{1}{10}\). Perhaps this is best seen pictorially:
So, what's going on in this picture? Well, first the normal density is divided up into 10 intervals of equal probability (0.10). Well, okay, so the picture is not drawn very well to scale. At any rate, we then find the IQs that correspond to the \(k = 10\) cumulative probabilities of 0.1, 0.2, 0.3, etc. This is done in two steps:
 Step 1
first by finding the Zscores associated with the cumulative probabilities 0.1, 0.2, 0.3, etc.
 Step 2
then by converting each Zscore into an Xvalue. It is those Xvalues (IQs) that will make up the "righthand side" of each bucket:
Category \(X\) Obs'd \(p_i = \left(e^{5.6}5.6^x\right) / x!\) Exp'd 1 0,1,2* 5 0.0824 8.24 2 3 13 0.1082 10.82 3 4 19 0.1515 15.15 4 5 16 0.1697 16.97 5 6 15 0.1584 15.84 6 7 9 0.1267 12.67 7 8 12 0.0887 8.87 8 9 7 0.0552 5.52 9 10,11,12* 4 0.0539 5.39 \(n = 100\) 99.47 
Category Class 1 (\(\infty\),79.5) 2 (79.5, 86.5) 3 (86.5, 91.6) 4 (91.6, 95.9) 5 (95.9, 100.0) 6 (100.0, 104.1) 7 (104.1, 108.4) 8 (108.4, 113.5) 9 (113.5, 120.5) 10 (120.5, \(\infty\))

Category  Class  Obs'd  Exp'd  Contribution to \(Q\) 

1  (\(\infty\),79.5)  7  10  \(\left(710\right)^2 / 10 = 0.9\) 
2  (79.5, 86.5)  7  10  \(\left(710\right)^2 / 10 = 0.9\) 
3  (86.5, 91.6)  14  10  \(\left(1410\right)^2 / 10 = 1.6\) 
4  (91.6, 95.9)  5  10  \(\left(510\right)^2 / 10 = 2.5\) 
5  (95.9, 100.0)  14  10  \(\left(1410\right)^2 / 10 = 1.6\) 
6  (100.0, 104.1)  10  10  \(\left(1010\right)^2 / 10 = 0.0\) 
7  (104.1, 108.4)  12  10  \(\left(1210\right)^2 / 10 = 0.4\) 
8  (108.4, 113.5)  11  10  \(\left(1110\right)^2 / 10 = 0.1\) 
9  (113.5, 120.5)  9  10  \(\left(910\right)^2 / 10 = 0.1\) 
10  (120.5, \(\infty\))  11  10  \(\left(1110\right)^2 / 10 = 0.1\) 
\(n = 100\)  \(n = 100\)  \(Q_9 = 8.2\) 
As illustrated in the table, using the observed and expected numbers, we see that the chisquare statistic is 8.2. We reject if the following is true:
\(Q_9 =8.2 \ge \chi_{101, 0.05}^{2} =\chi_{9, 0.05}^{2}=16.92\)
It isn't! We do not reject the null hypothesis at the 0.05 level. There is insufficient evidence to conclude that the data do not follow a normal distribution with a mean of 100 and a standard deviation 16.
16.5  Using Minitab to Lighten the Workload
16.5  Using Minitab to Lighten the WorkloadExample 168
This is how I used Minitab to help with the calculations of the alpha particle example on the Unspecified Probabilities page in this lesson.

Use Minitab's Calc >> Probability distribution >> Poisson command to determine the Poisson(5.6) probabilities:
Poisson with mean = 5.6 x P (X = x) 0 0.003698 1 0.020708 2 0.057983 3 0.108234 4 0.151528 5 0.169711 6 0.158397 7 0.126717 8 0.088702 9 0.055193 10 0.030908 11 0.015735 12 0.007343 
Enter the observed counts into one column and copy the probabilities (collapsing some categories, if necessary) into another column. Use Minitab's Calc >> Calculator command to generate the remaining necessary columns:
Poisson with mean = 5.6 Row Obsd p_i Expd Chisq 1 5 0.082389 8.2389 1.27329 2 15 0.108234 10.8234 0.43772 3 19 0.151528 15.1528 0.97678 4 16 0.169711 16.9711 0.05557 5 15 0.158397 15.8397 0.04451 6 9 0.126717 12.6717 1.06390 7 12 0.088702 8.8702 1.10433 8 7 0.055193 5.5193 0.39724 9 4 0.053986 5.3986 0.36233 Sum up the "Chisq" column to obtain the chisquare statistic Q.

Example 169
This is how I used Minitab to help with the calculations of the IQ example on the Continuous Random Variables page in this lesson.

The sorted data:
Sample 54 66 74 74 75 78 79 80 81 82 82 82 83 84 87 88 88 88 88 89 89 89 89 89 90 90 90 91 92 93 93 93 94 96 96 97 97 98 98 99 99 99 99 99 100 100 100 102 102 102 102 102 103 103 104 104 104 105 105 105 105 106 106 106 107 107 108 108 108 109 109 109 110 111 111 111 111 112 112 112 114 114 115 115 115 116 118 118 120 121 121 122 123 125 126 127 127 131 132 139 
The working table:
Row CProb Z Sample Obsd Expd Chisq 1 0.1 1.28155 79.495 7 10 0.9 2 0.2 0.84162 86.534 7 10 0.9 3 0.3 0.52440 91.610 14 10 1.6 4 0.4 0.25355 95.946 5 10 2.5 5 0.5 0.00000 100.000 14 10 1.6 6 0.6 0.25355 104.054 10 10 0.0 7 0.7 0.52440 108.390 12 10 0.4 8 0.8 0.84162 113.466 11 10 0.1 9 0.9 1.28155 120.505 9 10 0.1 10 11 10 0.1


The chisquare statistic: