6.2  ChiSquare Test Statistic
6.2  ChiSquare Test StatisticTo better understand what these expected counts represent, first recall that the expected counts table is designed to reflect what the sample data counts would be if the two variables were independent (the null hypothesis). In other words, under the null hypothesis we expect the proportions of observations to be similar in each cell. For example, if we ONLY considered the Northeast, and look at the expected counts for the Northeast across the two level of entrepreneurialism, under the null hypothesis we should have 50% in each level of entrepreneurialism. With actual values observed of 300 and 460 we can begin to suspect levels of entrepreneurialism may not be "independent" of location.
You may be looking at the expected counts for the Northeast and wondering why they aren't exactly 50/50. This is because the expected value is calculated as a function of both the ROWS and the COLUMNS! The great thing is, that our software will do the calculations for you, but again, it is helpful to have a conceptual understanding of expected values.
Low Entrepreneurialism  High Entrepreneurialism  All  

Northeast  300  460  760 
377.9  382.1  
Midwest  249  95  344 
171.1  172.9  
All  549  555  1104 
The statistical question becomes, "Are the observed counts so different from the expected counts that we can conclude a relationship exists between the two variables?" To conduct this test we compute a Chisquare test statistic where we compare each cell's observed count to its respective expected count.
In a summary table, we have \(r\times c=rc\) cells. Let \(O_1, O_2, …, O_{rc}\) denote the observed counts for each cell and \(E_1, E_2, …, E_{rc}\) denote the respective expected counts for each cell.
 ChiSquare Test Statistic

The Chisquare test statistic is calculated as follows:
\(\chi^{2*}=\displaystyle\sum\limits_{i=1}^{rc} \dfrac{(O_iE_i)^2}{E_i}\)
Under the null hypothesis and certain conditions (discussed below), the test statistic follows a Chisquare distribution with degrees of freedom equal to \((r1)(c1)\), where \(r\) is the number of rows and \(c\) is the number of columns. We leave out the mathematical details to show why this test statistic is used and why it follows a Chisquare distribution.
As we have done with other statistical tests, we make our decision by either comparing the value of the test statistic by finding the probability of getting this test statistic value or one more extreme. The pvalue is found by \(P(\chi^2>\chi^{2*})\) with degrees of freedom =\((r  1)(c  1)\).
So for Donna’s data, we compute the chisquare statistics
ChiSquare  DF  PValue  

Pearson  102.596  1  0.000 
Likelihood  105.357  1  0.000 
The resulting chisquare statistic is 102.596 with a pvalue of .000. The 2X2 table also includes the expected values. Remember the chisquare statistic is comparing the expected values to the observed values from Donna’s study. The results of the chisquare indicate this difference (observed – expected is large). Thus, Donna can reject the null hypothesis that entrepreneurialism and geographic location are independent and she can conclude that Entrepreneurialism levels depend on geographic location.
Conditions for Using the ChiSquare Test
Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the chisquare test if more than 20% of the cells have expected frequencies below five, especially if the pvalue is small and these cells give a large contribution to the total chisquare value.
Caution!
Sometimes researchers will categorize quantitative data (e.g., take height measurements and categorize as 'below average,' 'average,' and 'above average.'') Doing so results in a loss of information  one cannot do the reverse of taking the categories and reproducing the raw quantitative measurements. Instead of categorizing, the data should be analyzed using quantitative methods.