Say we have study of two categorical variables each with only two levels. One of the response levels is considered the "success" response and the other the "failure" response. A general 2 × 2 table of the observed counts would be as follows:
Success | Failure | Total | |
---|---|---|---|
Group 1 |
A |
B |
A + B |
Group 2 |
C |
D |
C + D |
The observed counts in this table represent the following proportions:
Success | Failure | Total | |
---|---|---|---|
Group 1 |
\(\hat{p}_1=\frac{A}{A+B}\) |
\(1-\hat{p}_1\) |
A + B |
Group 2 |
\(\hat{p}_2=\frac{C}{C+D}\) |
\(1-\hat{p}_2\) |
C + D |
Recall from our Z-test of two proportions that our null hypothesis is that the two population proportions, \(p_1\) and \(p_2\), were assumed equal while the two-sided alternative hypothesis was that they were not equal.
This null hypothesis would be analogous to the two groups being independent.
Also, if the two success proportions are equal, then the two failure proportions would also be equal. Note as well that with our Z-test the conditions were that the number of successes and failures for each group was at least 5. That equates to the Chi-square conditions that all expected cells in a 2 × 2 table be at least 5. (Remember at least 80% of all cells need an expected count of at least 5. With 80% of 4 equal to 3.2 this means all four cells must satisfy the condition).
When we run a Chi-square test of independence on a 2 × 2 table, the resulting Chi-square test statistic would be equal to the square of the Z-test statistic (i.e., \((Z^*)^2\)) from the Z-test of two independent proportions.
Application
Political Affiliation and Opinion Section
Consider the following example where we form a 2 × 2 for the Political Party and Opinion by only considering the Favor and Opposed responses:
favor | oppose | Total | |
---|---|---|---|
democrat |
138 |
64 |
202 |
republican |
64 |
84 |
148 |
Total |
202 |
148 |
350 |
The Chi-square test produces a test statistic of 22.00 with p-value 0.00
The Z-test comparing the two sample proportions of \(\hat{p}_d=\frac{138}{202}=0.683\) minus \(\hat{p}_r=\frac{64}{148}=0.432\) results in a Z-test statistic of \(4.69\) with p-value of \(0.000\).
If we square the Z-test statistic, we get \(4.69^2 = 21.99\) or \(22.00\) with rounding error.
Try it! Section
The condiments and gender data was condensed to consider gender and either mustard or ketchup. The manager wants to know if the proportion of males that prefer ketchup is the same as the proportion of females that prefer ketchup. Test the hypothesis two ways (1) using the Chi-square test and (2) using the z-test for independence with significance level of 10%. Show how the two test statistics are related and compare the p-values.
Condiment | ||||
---|---|---|---|---|
Gender | Ketchup | Mustard | Total | |
Male | 15 | 23 | 38 | |
Female | 25 | 19 | 44 | |
Total | 40 | 42 | 82 |
Z-test for two proportions
The hypotheses are:
\(H_0\colon p_1-p_2=0\)
\(H_a\colon p_1-p_2\ne 0\)
Let males be denoted as sample one and females as sample two. Using the table, we have:
\(n_1=38\) and \(\hat{p}_1=\frac{15}{38}=0.395\)
\(n_2=44\) and \(\hat{p}_2=\frac{25}{44}=0.568\)
The conditions are satisfied for this test (verify for extra practice).
To calculate the test statistic, we need:
\(p^*=\dfrac{x_1+x_2}{n_1+n_2}=\dfrac{15+25}{38+44}=\dfrac{40}{82}=0.4878\)
The test statistic is:
\begin{align} z^*&=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\\&=\dfrac{0.395-0.568}{\sqrt{0.4878(1-0.4878)\left(\frac{1}{38}+\frac{1}{44}\right)}}\\&=-1.567\end{align}
The p-value is \(2P(Z<-1.567)=0.1172\).
The p-value is greater than our significance level. Therefore, there is not enough evidence in the data to suggest that the proportion of males that prefer ketchup is different than the proportion of females that prefer ketchup.
Chi-square Test for independence
The expected count table is:
Condiment | ||||
---|---|---|---|---|
Gender | Ketchup | Mustard | Total | |
Male | 15 (18.537) | 23 (19.463) | 38 | |
Female | 25 (21.463) | 19 (22.537) | 44 | |
Total | 40 | 42 | 82 |
There are no expected counts less than 5. The test statistic is:
\(\chi^{2*}=\dfrac{(15-18.537)^2}{18.537}+\dfrac{(23-19.463)^2}{19.463}+\dfrac{(25-21.463)^2}{21.463}+\dfrac{(19-22.537)^2}{22.537}=2.46 \)
With 1 degree of freedom, the p-value is 0.1168. The p-value is greater than our significance value. Therefore, there is not enough evidence to suggest that gender and condiments (ketchup or mustard) are related.
Comparison
The p-values would be the same without rounding errors (0.1172 vs 0.1168). The z-statistic is -1.567. The square of this value is 2.455 which is what we have (rounded) for the chi-square statistic. The conclusions are the same.