Chi-Square Test of Independence Section
Do you remember how to test the independence of two categorical variables? This test is performed by using a Chi-square test of independence.
Recall that we can summarize two categorical variables within a two-way table, also called an r × c contingency table, where r = number of rows, c = number of columns. Our question of interest is “Are the two variables independent?” This question is set up using the following hypothesis statements:
- Null Hypothesis
- The two categorical variables are independent
- Alternative Hypothesis
- The two categorical variables are dependent
- Chi-Square Test Statistic
- \(\chi^2=\sum(O-E)^2/E\)
- where O represents the observed frequency. E is the expected frequency under the null hypothesis and computed by:
\[E=\frac{\text{row total}\times\text{column total}}{\text{sample size}}\]
We will compare the value of the test statistic to the critical value of \(\chi_{\alpha}^2\) with the degree of freedom = (r - 1) (c - 1), and reject the null hypothesis if \(\chi^2 \gt \chi_{\alpha}^2\).
Example S.4.1 Section
Is gender independent of education level? A random sample of 395 people was surveyed and each person was asked to report the highest education level they obtained. The data that resulted from the survey are summarized in the following table:
High School | Bachelors | Masters | Ph.d. | Total | |
---|---|---|---|---|---|
Female | 60 | 54 | 46 | 41 | 201 |
Male | 40 | 44 | 53 | 57 | 194 |
Total | 100 | 98 | 99 | 98 | 395 |
Question: Are gender and education level dependent at a 5% level of significance? In other words, given the data collected above, is there a relationship between the gender of an individual and the level of education that they have obtained?
Here's the table of expected counts:
High School | Bachelors | Masters | Ph.d. | Total | |
---|---|---|---|---|---|
Female | 50.886 | 49.868 | 50.377 | 49.868 | 201 |
Male | 49.114 | 48.132 | 48.623 | 48.132 | 194 |
Total | 100 | 98 | 99 | 98 | 395 |
So, working this out, \(\chi^2= \dfrac{(60−50.886)^2}{50.886} + \cdots + \dfrac{(57 − 48.132)^2}{48.132} = 8.006\)
The critical value of \(\chi^2\) with 3 degrees of freedom is 7.815. Since 8.006 > 7.815, we reject the null hypothesis and conclude that the education level depends on gender at a 5% level of significance.