8.1 - The Chi-Square Test for Independence

8.1 - The Chi-Square Test for Independence

How do we test the independence of two categorical variables? It will be done using the Chi-square test of independence.

As with all prior statistical tests we need to define null and alternative hypotheses. Also, as we have learned, the null hypothesis is what is assumed to be true until we have evidence to go against it. In this lesson, we are interested in researching if two categorical variables are related or associated (i.e., dependent). Therefore, until we have evidence to suggest that they are, we must assume that they are not. This is the motivation behind the hypothesis for the Chi-square Test of Independence:

  • \(H_0\): In the population, the two categorical variables are independent.
  • \(H_a\): In the population, the two categorical variables are dependent.

Note! The are several ways to phrase these hypotheses. Instead of using the words "independent" and "dependent" one could say "there is no relationship between the two categorical variables" versus "there is a relationship between the two categorical variables." Or "there is no association between the two categorical variables" versus "there is an association between the two variables." The important part is that the null hypothesis refers to the two categorical variables not being related while the alternative is trying to show that they are related.

Once we have gathered our data, we summarize the data in the two-way contingency table. This table represents the observed counts and is called the Observed Counts Table or simply the Observed Table. The contingency table on the introduction page to this lesson represented the observed counts of the party affiliation and opinion for those surveyed.

The question becomes, "How would this table look if the two variables were not related?" That is, under the null hypothesis that the two variables are independent, what would we expect our data to look like?

Consider the following table:

  Success Failure Total
Group 1 A B A+B
Group 2 C D C+D
Total A+C B+D A+B+C+D

The total count is \(A+B+C+D\). Let's focus on one cell, say Group 1 and Success with observed count A. If we go back to our probability lesson, let \(G_1\) denote the event 'Group 1' and \(S\) denote the event 'Success.' Then,

\(P(G_1)=\dfrac{A+B}{A+B+C+D}\) and \(P(S)=\dfrac{A+C}{A+B+C+D}\).

Recall that if two events are independent, then their intersection is the product of their respective probabilities. In other words, if \(G_1\) and \(S\) are independent, then...

\begin{align} P(G_1\cap S)&=P(G_1)P(S)\\&=\left(\dfrac{A+B}{A+B+C+D}\right)\left(\dfrac{A+C}{A+B+C+D}\right)\\[10pt] &=\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\end{align}

If we considered counts instead of probabilities, then we get the count by multiplying the probability by the total count. In other words...

\begin{align} \text{Expected count for cell with A} &=P(G_1)P(S)\  x\  (\text{total count}) \\   &= \left(\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\right)(A+B+C+D)\\[10pt]&=\mathbf{\dfrac{(A+B)(A+C)}{A+B+C+D}} \end{align}

This is the count we would expect to see if the two variables were independent (i.e. assuming the null hypothesis is true).

Expected Cell Count

The expected count for each cell under the null hypothesis is:

\(E=\dfrac{\text{(row total)}(\text{column total})}{\text{total sample size}}\)

Example 8-1: Political Affiliation and Opinion

To demonstrate, we will use the Party Affiliation and Opinion on Tax Reform example.

Observed Table:

  favor indifferent opposed total
democrat 138 83 64 285
republican 64 67 84 215
total 202 150 148 500

Find the expected counts for all of the cells.

Answer

We need to find what is called the Expected Counts Table or simply the Expected Table. This table displays what the counts would be for our sample data if there were no association between the variables.

Calculating Expected Counts from Observed Counts

  favor indifferent opposed total
democrat \(\frac{285(202)}{500}=115.14\) \(\frac{285(150)}{500}=85.5\) \(\frac{285(148)}{500}=84.36\) 285
republican \(\frac{215(202)}{500}=86.86\) \(\frac{215(150)}{500}=64.5\) \(\frac{215(148)}{500}=63.64\) 215
total 202 150 148 500

Chi-Square Test Statistic

To better understand what these expected counts represent, first recall that the expected counts table is designed to reflect what the sample data counts would be if the two variables were independent. Taking what we know of independent events, we would be saying that the sample counts should show a similarity on opinions of tax reform between democrats and republicans. If you find the proportion of each cell by taking a cell's expected count divided by it's row total, you will discover that in the expected table each opinion proportion is the same for democrats and republicans. That is, from the expected counts, 0.404 of the democrats and 0.404 of the republicans favor the bill; 0.3 of the democrats and 0.3 of the republicans are indifferent; and 0.296 of the democrats and 0.296 of the republicans are opposed.


The statistical question becomes, "Are the observed counts so different from the expected counts that we can conclude a relationship exists between the two variables?" To conduct this test we compute a Chi-square test statistic where we compare each cell's observed count to its respective expected count.

In a summary table, we have \(r\times c=rc\) cells. Let \(O_1, O_2, …, O_{rc}\) denote the observed counts for each cell and \(E_1, E_2, …, E_{rc}\) denote the respective expected counts for each cell.

Chi-Square Test Statistic

The Chi-square test statistic is calculated as follows:

\(\chi^{2*}=\sum\limits_{i=1}^{rc} \frac{(O_i-E_i)^2}{E_i}\)

Under the null hypothesis and certain conditions (discussed below), the test statistic follows a Chi-square distribution with degrees of freedom equal to \((r-1)(c-1)\), where \(r\) is the number of rows and \(c\) is the number of columns. We leave out the mathematical details to show why this test statistic is used and why it follows a Chi-square distribution.

As we have done with other statistical tests, we make our decision by either comparing the value of the test statistic to a critical value (rejection region approach) or by finding the probability of getting this test statistic value or one more extreme (p-value approach).

The critical value for our Chi-square test is \(\chi^2_{\alpha}\) with degree of freedom =\((r - 1) (c -1)\), while the p-value is found by \(P(\chi^2>\chi^{2*})\) with degrees of freedom =\((r - 1)(c - 1)\).

Example 8-1 Cont'd: Chi-Square

Let's apply the Chi-square Test of Independence to our example where we have a random sample of 500 U.S. adults who are questioned regarding their political affiliation and opinion on a tax reform bill. We will test if the political affiliation and their opinion on a tax reform bill are dependent at a 5% level of significance. Calculate the test statistic.

Answer

The contingency table (political_affiliation.txt) is given below. Each cell contains the observed count and the expected count in parentheses. For example, there were 138 democrats who favored the tax bill. The expected count under the null hypothesis is 115.14. Therefore, the cell is displayed as 138 (115.14).

  favor indifferent opposed total
democrat 138 (115.14) 83 (85.5) 64 (84.36) 285
republican 64 (86.86) 67 (64.50) 84 (63.64) 215
total 202 150 148 500

Calculating the test statistic by hand:

\begin{multline} \chi^{2*}=\dfrac{(138−115.14)^2}{115.14}+\dfrac{(83−85.50)^2}{85.50}+\dfrac{(64−84.36)^2}{84.36}+\\ \dfrac{(64−86.86)^2}{86.86}+\dfrac{(67−64.50)^2}{64.50}+\dfrac{(84−63.64)^2}{63.64}=22.152\end{multline}

...with degrees for freedom equal to \((2 - 1)(3 - 1) = 2\).

Note! We do not expect you to calculate the critical value or the p-value by hand. The p-value can be found using software.
Let's apply the Chi-square Test of Independence to our example where we have a random sample of 500 U.S. adults who are questioned regarding their political affiliation and opinion on a tax reform bill. Test if the political affiliation and their opinion on a tax reform bill are dependent at a 5% level of significance.

  Minitab: Chi-Square Test of Independence

To perform the Chi-Square test in Minitab...

  1. Choose Stat > Tables > Chi-Square Test for Association
  2. If you have summarized data (i.e., observed count) from the drop-down box 'Summarized data in a two-way table.' Select and enter the columns that contain the observed counts, otherwise, if you have the raw data use 'Raw data' (categorical variables). Note that if using the raw data your data will need to consist of two columns: one with the explanatory variable data (goes in the 'row' field) and the response variable data (goes in the 'column' field).
  3. Labeling (Optional) When using the summarized data you can label the rows and columns if you have the variable labels in columns of the worksheet. For example, if we have a column with the two political party affiliations and a column with the three opinion choices we could use these columns to label the output.
  4. Click the Statistics tab. Keep checked the four boxes already checked, but also check the box for 'Each cell's contribution to the chi-square.' Click OK .
  5. Click OK .

Note! If you have the observed counts in a table, you can copy/paste them into Minitab. For instance, you can copy the entire observed counts table (excluding the totals!) for our example and paste these into Minitab starting with the first empty cell of a column.


The following is the Minitab output for this example.

Cell Contents: Count, Expected count, Contribution to Chi-square
 

favor

indiffer opposed All

1

138

115.14

4.5836

83

85.50

0.0731

64

84.36

4.9138

285

2

64

86.86

6.0163

67

64.50

0.0969

84

63.64

6.5137

215

All

202 150 148 500

Pearson Chi-Sq = 4.5386 + 0.073 + 4.914 + 6.016 + 0.097 + 6.5137 = 22.152 DF = 2, P-Value = 0.000

 

Likelihood Ratio Chi-Square

(Ignore the Fisher's p-value! The p-value highlighted above is calculated using the methods we learned in this lesson. More specifically, the chi-square we learned is referred to as the Pearson Chi-square. The Fisher's test uses a different method than what we explained in this lesson to calculate a test statistic and p-value. This method incorporates a log of the ratio of observed to expected values. It's a different technique that is more complicated to do by-hand. Minitab automatically includes both results in its output.)

The Chi-square test statistic is 22.152 and calculated by summing all the individual cell's Chi-square contributions:

\(4.584 + 0.073 + 4.914 + 6.016 + 0.097 + 6.532 = 22.152\)

The p-value is found by \(P(X^2>22.152)\) with degrees of freedom =\((2-1)(3-1) = 2\).  

Minitab calculates this p-value to be less than 0.001 and reports it as 0.000. Given this p-value of 0.000 is less than the alpha of 0.05, we reject the null hypothesis that political affiliation and their opinion on a tax reform bill are independent. We conclude that there is evidence that the two variables are dependent (i.e., that there is an association between the two variables).

Conditions for Using the Chi-Square Test

Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the chi-square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total chi-square value.

Example 8-2: Tire Quality

The operations manager of a company that manufactures tires wants to determine whether there are any differences in the quality of work among the three daily shifts. She randomly selects 496 tires and carefully inspects them. Each tire is either classified as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The two categorical variables of interest are shift and condition of the tire produced. The data (shift_quality.txt) can be summarized by the accompanying two-way table. Does the data provide sufficient evidence at the 5% significance level to infer that there are differences in quality among the three shifts?

  Perfect Satisfactory Defective Total

Shift 1

106

124

1

231

Shift 2

67

85

1

153

Shift 3

37

72

3

112

Total

210

281

5

496

Answer
Minitab output:

Chi-Square Test

 

C1

C2 C3 Total

1

106

97.80

124

130.87

1

2.33

231

2

67

64.78

85

86.68

1

1.54

153

3

37

47.42

72

63.45

3

1.13

112
Total 210 281 5 496

Chi-Sq = 8.647 DF = 4, P-Value = 0.071 

Note that there are 3 cells with expected counts less than 5.0.

In the above example, we don't have a significant result at a 5% significance level since the p-value (0.071) is greater than 0.05. Even if we did have a significant result, we still could not trust the result, because there are 3 (33.3% of) cells with expected counts < 5.0

Caution!

Sometimes researchers will categorize quantitative data (e.g., take height measurements and categorize as 'below average,' 'average,' and 'above average.'') Doing so results in a loss of information - one cannot do the reverse of taking the categories and reproducing the raw quantitative measurements. Instead of categorizing, the data should be analyzed using quantitative methods.

Try it!

A food services manager for a baseball park wants to know if there is a relationship between gender (male or female) and the preferred condiment on a hot dog. The following table summarizes the results. Test the hypothesis with a significance level of 10%.

    Condiment
Gender   Ketchup Mustard Relish Total
Male 15 23 10 48
Female 25 19 8 52
Total 40 42 18 100
 

The hypotheses are:

  • \(H_0\): Gender and condiments are independent
  • \(H_a\): Gender and condiments are not independent

We need to expected counts table:

    Condiment
Gender   Ketchup Mustard Relish Total
Male 15 (19.2) 23 (20.16) 10 (8.64) 48
Female 25 (20.8) 19 (21.84) 8 (9.36) 52
Total 40 42 18 100

None of the expected counts in the table are less than 5. Therefore, we can proceed with the Chi-square test.

The test statistic is:

\begin{multline} \chi^{2*}=\dfrac{(15-19.2)^2}{19.2}+\dfrac{(23-20.16)^2}{20.16}+\dfrac{(10-8.64)^2}{8.64}+\\\dfrac{(25-20.8)^2}{20.8}+\dfrac{(19-21.84)^2}{21.84}+\dfrac{(8-9.36)^2}{9.36}=2.95\end{multline}

The p-value is found by \(P(\chi^2>\chi^{2*})=P(\chi^2>2.95)\) with (3-1)(2-1)=2 degrees of freedom. Using a table or software, we find the p-value to be 0.2288.

With a p-value greater than 10%, we can conclude that there is not enough evidence in the data to suggest that gender and preferred condiment are related.

 


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility