2.4  GoodnessofFit Test
2.4  GoodnessofFit TestA goodnessoffit test, in general, refers to measuring how well do the observed data correspond to the fitted (assumed) model. We will use this concept throughout the course as a way of checking the model fit. Like in linear regression, in essence, the goodnessoffit test compares the observed values to the expected (fitted or predicted) values.
A goodnessoffit statistic tests the following hypothesis:
\(H_0\colon\) the model \(M_0\) fits
vs.
\(H_A\colon\) the model \(M_0\) does not fit (or, some other model \(M_A\) fits)
Most often the observed data represent the fit of the saturated model, the most complex model possible with the given data. Thus, most often the alternative hypothesis \(\left(H_A\right)\) will represent the saturated model \(M_A\) which fits perfectly because each observation has a separate parameter. Later in the course, we will see that \(M_A\) could be a model other than the saturated one. Let us now consider the simplest example of the goodnessoffit test with categorical data.
In the setting for oneway tables, we measure how well an observed variable X corresponds to a \(Mult\left(n, \pi\right)\) model for some vector of cell probabilities, \(\pi\). We will consider two cases:
 when vector \(\pi\) is known, and
 when vector \(\pi\) is unknown.
In other words, we assume that under the null hypothesis data come from a \(Mult\left(n, \pi\right)\) distribution, and we test whether that model fits against the fit of the saturated model. The rationale behind any model fitting is the assumption that a complex mechanism of data generation may be represented by a simpler model. The goodnessoffit test is applied to corroborate our assumption.
Consider our dice example from Lesson 1. We want to test the hypothesis that there is an equal probability of six faces by comparing the observed frequencies to those expected under the assumed model: \(X \sim Multi(n = 30, \pi_0)\), where \(\pi_0=(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)\). We can think of this as simultaneously testing that the probability in each cell is being equal or not to a specified value:
\(H_0:\pi =\pi_0\)
where the alternative hypothesis is that any of these elements differ from the null value. There are two statistics available for this test.
Test Statistics
Pearson Goodnessoffit Test Statistic
The Pearson goodnessoffit statistic is
\(X^2=\sum\limits_{j=1}^k \dfrac{(X_jn\pi_{0j})^2}{n\pi_{0j}}\)
An easy way to remember it is
\(X^2=\sum\limits_{j=1}^k \dfrac{(O_jE_j)^2}{E_j}\)
where \(O_j = X_j\) is the observed count in cell \(j\), and \(E_j=E(X_j)=n\pi_{0j}\) is the expected count in cell \(j\) under the assumption that null hypothesis is true.
Deviance Test Statistic
The deviance statistic is
\(G^2=2\sum\limits_{j=1}^k X_j \log\left(\dfrac{X_j}{n\pi_{0j}}\right) =2\sum\limits_j O_j \log\left(\dfrac{O_j}{E_j}\right)\)
In some texts, \(G^2\) is also called the likelihoodratio test (LRT) statistic, for comparing the loglikelihoods \(L_0\) and \(L_1\) of two models under \(H_0\) (reduced model) and \(H_A\) (full model), respectively:
Likelihoodratio Test Statistic
\(G^2 = 2\log\left(\dfrac{\ell_0}{\ell_1}\right) = 2\left(L_0  L_1\right)\)
Note that \(X^2\) and \(G^2\) are both functions of the observed data \(X\) and a vector of probabilities \(\pi_0\). For this reason, we will sometimes write them as \(X^2\left(x, \pi_0\right)\) and \(G^2\left(x, \pi_0\right)\), respectively; when there is no ambiguity, however, we will simply use \(X^2\) and \(G^2\). We will be dealing with these statistics throughout the course in the analysis of 2way and \(k\)way tables and when assessing the fit of loglinear and logistic regression models.
Testing the GoodnessofFit
\(X^2\) and \(G^2\) both measure how closely the model, in this case \(Mult\left(n,\pi_0\right)\) "fits" the observed data. And both have an approximate chisquare distribution with \(k1\) degrees of freedom when \(H_0\) is true. This allows us to use the chisquare distribution to find critical values and \(p\)values for establishing statistical significance.
 If the sample proportions \(\hat{\pi}_j\) (i.e., saturated model) are exactly equal to the model's \(\pi_{0j}\) for cells \(j = 1, 2, \dots, k,\) then \(O_j = E_j\) for all \(j\), and both \(X^2\) and \(G^2\) will be zero. That is, the model fits perfectly.
 If the sample proportions \(\hat{\pi}_j\) deviate from the \(\pi_{0j}\)s, then \(X^2\) and \(G^2\) are both positive. Large values of \(X^2\) and \(G^2\) mean that the data do not agree well with the assumed/proposed model \(M_0\).
Example: Dice Rolls
Suppose that we roll a die 30 times and observe the following table showing the number of times each face ends up on top.
Face  Count 

1  3 
2  7 
3  5 
4  10 
5  2 
6  3 
Total  30 
We want to test the null hypothesis that the die is fair. Under this hypothesis, \(X \sim Mult\left(n = 30, \pi_0\right)\) where \(\pi_{0j} = 1/6\), for \(j=1,\ldots,6\). This is our assumed model, and under this \(H_0\), the expected counts are \(E_j = 30/6= 5\) for each cell. We now have what we need to calculate the goodnessoffit statistics:
\begin{eqnarray*} X^2 &= & \dfrac{(35)^2}{5}+\dfrac{(75)^2}{5}+\dfrac{(55)^2}{5}\\ & & +\dfrac{(105)^2}{5}+\dfrac{(25)^2}{5}+\dfrac{(35)^2}{5}\\ &=& 9.2 \end{eqnarray*}
\begin{eqnarray*} G^2 &=& 2\left(3\text{log}\dfrac{3}{5}+7\text{log}\dfrac{7}{5}+5\text{log}\dfrac{5}{5}\right.\\ & & \left.+ 10\text{log}\dfrac{10}{5}+2\text{log}\dfrac{2}{5}+3\text{log}\dfrac{3}{5}\right)\\ &=& 8.8 \end{eqnarray*}
Note that even though both have the same approximate chisquare distribution, the realized numerical values of \(Χ^2\) and \(G^2\) can be different. The \(p\)values are \(P\left(\chi^{2}_{5} \ge 9.2\right) = .10\) and \(P\left(\chi^{2}_{5} \ge 8.8\right) = .12\). Given these \(p\)values, with the significance level of \(\alpha=0.05\), we fail to reject the null hypothesis. But rather than concluding that \(H_0\) is true, we simply don't have enough evidence to conclude it's false. That is, the fairdie model doesn't fit the data exactly, but the fit isn't bad enough to conclude that the die is unfair, given our significance threshold of 0.05.
Next, we show how to do this in SAS and R.
The following SAS code will perform the goodnessoffit test for the example above.
/*
 Example: die
*/
data die;
input face $ count;
datalines;
1 3
2 7
3 5
4 10
5 2
6 3
;
run;
proc freq;
weight count;
tables face/ chisq;
/*tables face /nocum all chisq testp=(16.67 16.67 16.67 16.67 16.67 16.67);*/
run;
/********computation of residuales, deviance residuals, X2 and G2 ****/
data cal;
set die;
pi=1/6;
ecount=30*pi;
res=(countecount)/sqrt(ecount);
devres=sqrt(abs(2*count*log(count/ecount)))*sign(countecount);
run;
proc print;run;
proc sql;
select sum((countecount)**2/ecount) as X2,
1probchi(calculated X2,5) as pval1,
2*sum(count*log(count/ecount)) as G2,
1probchi(calculated G2,5) as pval2
from cal;
quit;
Output
The FREQ Procedure
face  Frequency  Percent  Cumulative Frequency 
Cumulative Percent 

1  3  10.00  3  10.00 
2  7  23.33  10  33.33 
3  5  16.67  15  50.00 
4  10  33.33  25  83.33 
5  2  6.67  27  90.00 
6  3  10.00  30  100.00 
ChiSquare Test for Equal Proportions 


ChiSquare  9.2000 
DF  5 
Pr > ChiSq  0.1013 
Sample Size = 30
Obs  face  count  pi  ecount  res  devres 

1  1  3  0.16667  5  0.89443  1.75070 
2  2  7  0.16667  5  0.89443  2.17039 
3  3  5  0.16667  5  0.00000  0.00000 
4  4  10  0.16667  5  2.23607  3.72330 
5  5  2  0.16667  5  1.34164  1.91446 
6  6  3  0.16667  5  0.89443  1.75070 
X2  pval1  G2  pval2 

9.2  0.101348  8.778485  0.118233 
Notice that this SAS code only computes the Pearson chisquare statistic and not the deviance statistic.
Stop and Think!
Can you identify the relevant statistics and the \(p\)value in the output? What does the column labeled "Percent" represent?
The following R code, dice_rolls.R will perform the same analysis as in SAS. The output will be saved into two files, dice_rolls.out and dice_rolls_Results.
###### Dice Rolls from Lesson 2: oneway tables & GOF
##### Line by line calculations in R
##### Nice R code that corresponds to SAS code and output
##########################################################
### if you want all output into a file use: sink("dice_roll.out")
sink("dice_roll.out")
### run a goodness of fit test
dice< chisq.test(c(3,7,5,10,2,3))
dice
########OUTPUT gives Pearson chisquared
# Chisquared test for given probabilities
#
# data: c(3, 7, 5, 10, 2, 3)
# Xsquared = 9.2, df = 5, pvalue = 0.1013
########
### to get observed values
dice$observed
### to get expected values
dice$expected
### to get Pearson residuals
dice$residuals
#####Make the output print into a nice table ######
#### creating a table and giving labels to the columns
out<round(cbind(1:6, dice$observed, dice$expected, dice$residuals),3)
out<as.data.frame(out)
names(out)<c("cell_j", "O_j", "E_j", "res_j")
### printing your table of results into a text file with tab separation
write.table(out, "dice_rolls_Results", row.names=FALSE, col.names=TRUE, sep="\t")
#########TO GET Deviance statistic and it's pvalue
G2=2*sum(dice$observed*log(dice$observed/dice$expected))
G2
1pchisq(G2,5)
##deviance residuals
devres=sign(dice$observeddice$expected)*sqrt(abs(2*dice$observed*log(dice$observed/dice$expected)))
devres
##to show that the G2 is a sum of deviance residuals
G2=sum(sign(dice$observeddice$expected)*(sqrt(abs(2*dice$observed*log(dice$observed/dice$expected))))^2)
G2
########## If you want to specify explicitly the vector of probabilities
dice1<chisq.test(c(3,7,5,10,2,3), p=c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6))
dice1
############################################################
#### Nice R code that corresponds to SAS code and its output
## vector "face" records the face of the dice you get every time you roll it.
face=c(rep(1,3),rep(2,7),rep(3,5),rep(4,10),rep(5,2),rep(6,3))
## Freq Procedure
Percentage=100*as.vector(table(face))/sum(table(face))
CumulativeFrequency=cumsum(c(3,7,5,10,2,3))
CumulativePercentage=cumsum(Percentage)
Freq=data.frame(table(face),Percentage,CumulativeFrequency,CumulativePercentage)
row.names(Freq)=NULL
Freq
## ChiSquare Test for Equal Proportions
chisq.test(table(face))
### if you used sink("dice_roll.out"), and now want to see the output inside the console use: sink()
sink()
Output
[1] 3 7 5 10 2 3
[1] 5 5 5 5 5 5
[1] 0.8944272 0.8944272 0.0000000 2.2360680 1.3416408 0.8944272
[1] 8.778485
[1] 0.1182326
Chisquared test for given probabilities
data: c(3, 7, 5, 10, 2, 3)
Xsquared = 9.2, df = 5, pvalue = 0.1013
face Freq Percentage CumulativeFrequency CumulativePercentage
1 1 3 10.000000 3 10.00000
2 2 7 23.333333 10 33.33333
3 3 5 16.666667 15 50.00000
4 4 10 33.333333 25 83.33333
5 5 2 6.666667 27 90.00000
6 6 3 10.000000 30 100.00000
Stop and Think!
Can you identify the relevant statistics and the \(p\)value in the output? What does the column labeled "Percentage" in dice_rolls.out represent?
Example: Tomato Phenotypes
Tall cutleaf tomatoes were crossed with dwarf potatoleaf tomatoes, and n = 1611 offspring were classified by their phenotypes.
Phenotypes  Count 

tall cutleaf  926 
tall potatoleaf  288 
dwarf cutleaf  293 
dwarf potatoleaf  104 
Genetic theory says that the four phenotypes should occur with relative frequencies 9 : 3 : 3 : 1, and thus are not all equally as likely to be observed. The dwarf potatoleaf is less likely to observed than the others. Do the observed data support this theory?
Under the null hypothesis, the probabilities are
\(\pi_1 = 9/16 , \pi_2 = \pi_3 = 3/16 , \pi_4 = 1/16\)
and the expected frequencies are
\(E_1 = 1611(9/16) = 906.2, E_2 = E_3 = 1611(3/16) = 302.1,\text{ and }E_4 = 1611(1/16) = 100.7\).
We calculate the fit statistics and find that \(X^2 = 1.47\) and \(G^2 = 1.48\), which are nearly identical. The \(p\)values based on the \(\chi^2\) distribution with 3 degrees of freedom are approximately equal to 0.69. Therefore, we fail to reject the null hypothesis and accept (by default) that the data are consistent with the genetic theory.
Here is the above analysis done in SAS.
/*
 Example: cutleaf
*/
data leaf;
input type $ count;
datalines;
tallc 926
tallp 288
dwarf 293
dwarfp 104
;
proc freq order=data;
weight count;
tables type /all testp=(56.25, 18.75, 18.75, 6.25);
run;
/********computation of residuales, deviance residuals, X2 and G2 ****/
data pi;
input freq;
pi=freq/16;
cards;
9
3
3
1
;
run;
data cal;
merge leaf pi;
ecount=1611*pi;
res=(countecount)/sqrt(ecount);
devres=sqrt(abs(2*count*log(count/ecount)))*sign(countecount);
run;
proc print;run;
proc sql;
select sum((countecount)**2/ecount) as X2,
1probchi(calculated X2, 3) as pval1,
2*sum(count*log(count/ecount)) as G2,
1probchi(calculated G2,3) as pval2
from cal;
quit;
/*************plot observed values and expected values**********/
goption reset=all ;
symbol1 v=circle c=blue I=none;
symbol2 v=circle c=red I=none;
Legend1 label=(' ') value=('observed count' 'expected count')
across=1 down=2
position = (top right inside)
mode=protect;
title "Tomato Phenotypes";
proc gplot data=cal;
plot count*type ecount*type /overlay legend=legend1 ;
run;
quit;
title;
Output
The FREQ Procedure
type  Frequency  Percent  Test Percent 
Cumulative Frequency 
Cumulative Percent 

tallc  926  57.48  56.25  926  57.48 
tallp  288  17.88  18.75  1214  75.36 
dwarf  293  18.19  18.75  1507  93.54 
dwarfp  104  6.46  6.25  1611  100.00 
ChiSquare Test for Specified Proportions 


ChiSquare  1.4687 
DF  3 
Pr > ChiSq  0.6895 
Sample Size = 1611
Obs  type  count  freq  pi  ecount  res  devres 

1  tallc  926  9  0.5625  906.188  0.65816  6.32891 
2  tallp  288  3  0.1875  302.063  0.80912  5.24022 
3  dwarf  293  3  0.1875  302.063  0.52143  4.22497 
4  dwarfp  104  1  0.0625  100.688  0.33012  2.59476 
X2  pval1  G2  pval2 

1.468722  0.689508  1.477587  0.687453 
Here is how to do the computations in R using the following code :
#### Lesson 2 CutLeaf example
#### (I) Basic GOF line by line calculation
#### (II) Doing GOF with chisq.test()
#### (III) Nice R code that corresponds to SAS code and output
#########################################################
##(I) Basic GOF line by line computation to demonstrate formulas
ob=c(926,288,293,104) ## data
ex=1611*c(9,3,3,1)/16 ## expected counts under the assumed model
X2=sum((obex)^2/ex) ## X2 statistic
X2
#### Output so you check against your output
#[1] 1.468722
1pchisq(X2,3) ## pvalue
#### Output
#[1] 0.6895079
G2=2*sum(ob*log(ob/ex)) ## deviance statistic
G2
#### Output
#[1] 1.477587
1pchisq(G2,3) ## pvalue
#### Output
#[1] 0.6874529
########################################################
#### (II) Using the builtin chisq.test function in R
tomato=chisq.test(c(926, 288,293,104), p=c(9/16, 3/16, 3/16, 1/16))
tomato
tomato$statistic
tomato$p.value
tomato$residuals
#### To get G2
G2=2*sum(tomato$observed*log(tomato$observed/tomato$expected))
G2
1pchisq(G2,3)
##deviance residuals
devres=sign(tomato$observedtomato$expected)*sqrt(abs(2*tomato$observed*log(tomato$observed/tomato$expected)))
devres
##### Creating a nice ouput
### Cell id, Observed count, Expected count, Pearson residuals, Deviance residual
out<round(cbind(1:5, tomato$observed, tomato$expected, tomato$residuals, devres),3)
out<as.data.frame(out)
names(out)<c("cell_j", "O_j", "E_j", "res_j", "dev_j")
out
#### printing your table of results into a text file with tab separation
write.table(out, "tomato_Results", row.names=FALSE, col.names=TRUE, sep="\t")
#### Ploting expected and observed values
plot(c(1:4), ex$observed, xlab="cell index", ylab="counts", xlim=c(0,5))
points(ex$expected, pch=3, col="red")
legend(3,700, c("observed", "expected"), col=c(1,"red"), pch=c(1,3))
########################################################
#### (III) Nice R code that corresponds to SAS code and output
type=c(rep("tallc",926),rep("tallp",288),rep("dwarf",293),rep("dwarfp",104))
##Please note the table R provides has different order of rows
##from that provided by SAS
table(type)
## Freq Procedure
Percentage=100*as.vector(table(type))/sum(table(type))
CumulativeFrequency=cumsum(c(926,288,293,104))
CumulativePercentage=cumsum(Percentage)
Freq=data.frame(table(type),Percentage,CumulativeFrequency,CumulativePercentage)
Freq
## ChiSquare Test for Specified Proportions
p=c(18.75,6.25,56.25,18.75)
chisq.test(table(type),p=p/sum(p))
########################################################
This has stepbystep calculations and also uses chisq.test() to produce output with Pearson and deviance residuals.
Stop and Think!
Do you recall what the residuals are from linear regression? How would you define them in this context? What do they tell you about the tomato example?