8.2 - The 2x2 Table: Test of 2 Independent Proportions

Say we have a study of two categorical variables each with only two levels. One of the response levels is considered the "success" response and the other the "failure" response. A general 2 × 2 table of the observed counts would be as follows:

  Success Failure Total

Group 1

A

B

A + B

Group 2

C

D

C + D

The observed counts in this table represent the following proportions:

  Success Failure Total

Group 1

\(\hat{p}_1=\frac{A}{A+B}\)

\(1-\hat{p}_1\)

A + B

Group 2

\(\hat{p}_2=\frac{C}{C+D}\)

\(1-\hat{p}_2\)

C + D

Recall from our Z-test of two proportions that our null hypothesis is that the two population proportions, \(p_1\) and \(p_2\), were assumed equal while the two-sided alternative hypothesis was that they were not equal.

This null hypothesis would be analogous to the two groups being independent.

Also, if the two success proportions are equal, then the two failure proportions would also be equal. Note as well that with our Z-test the conditions were that the number of successes and failures for each group was at least 5. That equates to the Chi-square conditions that all expected cells in a 2 × 2 table be at least 5. (Remember at least 80% of all cells need an expected count of at least 5. With 80% of 4 equal to 3.2 this means all four cells must satisfy the condition).

When we run a Chi-square test of independence on a 2 × 2 table, the resulting Chi-square test statistic would be equal to the square of the Z-test statistic (i.e., \((Z^*)^2\)) from the Z-test of two independent proportions.

Application

Political Affiliation and Opinion Section

Consider the following example where we form a 2 × 2 for the Political Party and Opinion by only considering the Favor and Opposed responses:

  favor oppose Total

democrat

138

64

202

republican

64

84

148

Total

202

148

350

The Chi-square test produces a test statistic of 22.00 with a p-value of 0.00

The Z-test comparing the two sample proportions of \(\hat{p}_d=\frac{138}{202}=0.683\) minus \(\hat{p}_r=\frac{64}{148}=0.432\) results in a Z-test statistic of \(4.69\) with p-value of \(0.000\).

If we square the Z-test statistic, we get \(4.69^2 = 21.99\) or \(22.00\) with rounding error.

Try it! Section

The condiments and gender data were condensed to consider gender and either mustard or ketchup. The manager wants to know if the proportion of males that prefer ketchup is the same as the proportion of females that prefer ketchup. Test the hypothesis two ways (1) using the Chi-square test and (2) using the z-test for independence with a significance level of 10%. Show how the two test statistics are related and compare the p-values.

    Condiment
Gender   Ketchup Mustard Total
Male 15 23 38
Female 25 19 44
Total 40 42 82

Z-test for two proportions

The hypotheses are:

\(H_0\colon p_1-p_2=0\)

\(H_a\colon p_1-p_2\ne 0\)

Let males be denoted as sample one and females as sample two. Using the table, we have:

\(n_1=38\) and \(\hat{p}_1=\frac{15}{38}=0.395\)

\(n_2=44\) and \(\hat{p}_2=\frac{25}{44}=0.568\)

The conditions are satisfied for this test (verify for extra practice).

To calculate the test statistic, we need:

\(p^*=\dfrac{x_1+x_2}{n_1+n_2}=\dfrac{15+25}{38+44}=\dfrac{40}{82}=0.4878\)

The test statistic is:

\begin{align} z^*&=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\\&=\dfrac{0.395-0.568}{\sqrt{0.4878(1-0.4878)\left(\frac{1}{38}+\frac{1}{44}\right)}}\\&=-1.567\end{align}

The p-value is \(2P(Z<-1.567)=0.1172\).

The p-value is greater than our significance level. Therefore, there is not enough evidence in the data to suggest that the proportion of males that prefer ketchup is different than the proportion of females that prefer ketchup.


Chi-square Test for independence

The expected count table is:

    Condiment
Gender   Ketchup Mustard Total
Male 15 (18.537) 23 (19.463) 38
Female 25 (21.463) 19 (22.537) 44
Total 40 42 82

There are no expected counts less than 5. The test statistic is:

\(\chi^{2*}=\dfrac{(15-18.537)^2}{18.537}+\dfrac{(23-19.463)^2}{19.463}+\dfrac{(25-21.463)^2}{21.463}+\dfrac{(19-22.537)^2}{22.537}=2.46 \)

With 1 degree of freedom, the p-value is 0.1168. The p-value is greater than our significance value. Therefore, there is not enough evidence to suggest that gender and condiments (ketchup or mustard) are related.


Comparison

The p-values would be the same without rounding errors (0.1172 vs 0.1168). The z-statistic is -1.567. The square of this value is 2.455 which is what we have (rounded) for the chi-square statistic. The conclusions are the same.