8 Chi-Square Test for Independence
Overview
Let’s start by recapping what we have discussed thus far in the course and mention what remains:
- The fundamentals of the sampling distributions for the sample mean and the sample proportion.
- We illustrated how these sampling distributions form the basis for estimation (confidence intervals) and testing for one mean or one proportion.
- Then we extended the discussion to analyzing situations for two variables; one a response and the other an explanatory. When both variables were categorical we compared two proportions; when the explanatory was categorical, and the response was quantitative, we compared two means.
- Next, we will take a look at other methods and discuss how they apply to situations where:
- both variables are categorical with at least one variable with more than two levels (Chi-Square Test of Independence)
- both variables are quantitative (Linear Regression)
- the explanatory variable is categorical with more than two levels, and the response is quantitative (Analysis of Variance or ANOVA)
In this lesson, we will examine relationships where both variables are categorical using the Chi-Square Test of Independence. We will illustrate the connection between the Chi-Square test for independence and the z-test for two independent proportions in the case where each variable has only two levels.
Going forward, keep in mind that this Chi-Square test, when significant, only provides statistical evidence of an association or relationship between the two categorical variables. Do NOT confuse this result with a correlation which refers to a linear relationship between two quantitative variables (more on this in the next lesson).
The primary method for displaying the summarization of categorical variables is called a contingency table. When we have two measurements on our subjects that are both categorical, the contingency table is sometimes referred to as a two-way table.
This terminology is derived because the summarized table consists of rows and columns (i.e., the data display goes two ways).
The size of a contingency table is defined by the number of rows times the number of columns associated with the levels of the two categorical variables. The size is notated \(r\times c\), where \(r\) is the number of rows of the table and \(c\) is the number of columns. A cell displays the count for the intersection of a row and column. Thus the size of a contingency table also gives the number of cells for that table. For example, if we have a \(2\times2\) table, then we have \(2(2)=4\) cells.
Example 8.1 (Political Affiliation and Opinion)
A random sample of 500 U.S. adults is questioned regarding their political affiliation and opinion on a tax reform bill. The results of this survey are summarized in the following contingency table:
Favor | Indifferent | Opposed | Total | |
---|---|---|---|---|
Democrat | 138 | 83 | 64 | 285 |
Republican | 64 | 67 | 84 | 215 |
Total | 202 | 150 | 148 | 500 |
The size of this table is \(2\times 3\) and NOT \(3\times 4\). There are only two rows of observed data for Party Affiliation and three columns of observed data for their Opinion. We define the Party Affiliation as the explanatory variable and Opinion as the response because it is more natural to analyze how one’s opinion is shaped by their party affiliation than the other way around.
From here, we would want to determine if an association (relationship) exists between Political Party Affiliation and Opinion on a Tax Reform Bill. That is, are the two variables dependent? We’ll discuss in the next section how to approach this.
Objectives
Upon completion of this lesson, you should be able to:
- Determine when to use the Chi-Square test for independence.
- Compute expected counts for a table assuming independence.
- Calculate the Chi-Square test statistic given a contingency table by hand and with technology.
- Conduct the Chi-Square test for independence.
- Explain how the Chi-Square test for independence is related to the hypothesis test for two independent proportions.
- Calculate and interpret risk and relative risk.
8.1 The Chi-Square Test of Independence
How do we test the independence of two categorical variables? It will be done using the Chi-Square Test of Independence.
As with all prior statistical tests we need to define null and alternative hypotheses. Also, as we have learned, the null hypothesis is what is assumed to be true until we have evidence to go against it. In this lesson, we are interested in researching if two categorical variables are related or associated (i.e., dependent). Therefore, until we have evidence to suggest that they are, we must assume that they are not. This is the motivation behind the hypothesis for the Chi-Square Test of Independence:
- \(H_0\): In the population, the two categorical variables are independent.
- \(H_a\): In the population, the two categorical variables are dependent.
Once we have gathered our data, we summarize the data in the two-way contingency table. This table represents the observed counts and is called the Observed Counts Table or simply the Observed Table. The contingency table on the introduction page to this lesson represented the observed counts of the party affiliation and opinion for those surveyed.
The question becomes, “How would this table look if the two variables were not related?” That is, under the null hypothesis that the two variables are independent, what would we expect our data to look like?
Consider the following table:
Success | Failure | Total | |
---|---|---|---|
Group 1 | A | B | A+B |
Group 2 | C | D | C+D |
Total | A+C | B+D | A+B+C+D |
The total count is \(A+B+C+D\). Let’s focus on one cell, say Group 1 and Success with observed count A. If we go back to our probability lesson, let \(G_1\) denote the event ‘Group 1’ and \(S\) denote the event ‘Success.’ Then,
\[P(G_1)=\dfrac{A+B}{A+B+C+D} \text{ and } P(S)=\dfrac{A+C}{A+B+C+D}\] Recall that if two events are independent, then their intersection is the product of their respective probabilities. In other words, if \(G_1\) and \(S\) are independent, then…
\[\begin{align} P(G_1\cap S)&=P(G_1)P(S)\\&=\left(\dfrac{A+B}{A+B+C+D}\right)\left(\dfrac{A+C}{A+B+C+D}\right)\\[10pt] &=\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\end{align}\]
If we consider counts instead of probabilities, then we get the count by multiplying the probability by the total count. In other words…
This is the count we would expect to see if the two variables were independent (i.e. assuming the null hypothesis is true).
Example 8.2 (Political Affiliation and Opinion)
To demonstrate, we will use the Party Affiliation and Opinion on Tax Reform example (Example 8.1).
Favor | Indifferent | Opposed | Total | |
---|---|---|---|---|
Democrat | 138 | 83 | 64 | 285 |
Republican | 64 | 67 | 84 | 215 |
Total | 202 | 150 | 148 | 500 |
Find the expected counts for all of the cells.
Answer
We need to find what is called the Expected Counts Table or simply the Expected Table. This table displays what the counts would be for our sample data if there were no association between the variables.
Calculating Expected Counts from Observed Counts
Favor | Indifferent | Opposed | Total | |
---|---|---|---|---|
Democrat | \(\dfrac{285(202)}{500}=115.14\) | \(\dfrac{285(150)}{500}=85.5\) | \(\dfrac{285(148)}{500}=84.36\) | 285 |
Republican | \(\dfrac{215(202)}{500}=86.86\) | \(\dfrac{215(150)}{500}=64.5\) | \(\dfrac{215(148)}{500}=63.64\) | 215 |
Total | 202 | 150 | 148 | 500 |
Chi-Square Test Statistic
To better understand what these expected counts represent, first recall that the expected counts table is designed to reflect what the sample data counts would be if the two variables were independent. Taking what we know of independent events, we would be saying that the sample counts should show similarity in opinions of tax reform between Democrats and Republicans. If you find the proportion of each cell by taking a cell’s expected count divided by its row total, you will discover that in the expected table each opinion proportion is the same for Democrats and Republicans. That is, from the expected counts, 0.404 of the Democrats and 0.404 of the Republicans favor the bill; 0.3 of the Democrats and 0.3 of the Republicans are indifferent; and 0.296 of the Democrats and 0.296 of the Republicans are opposed.
The statistical question becomes, “Are the observed counts so different from the expected counts that we can conclude a relationship exists between the two variables?” To conduct this test we compute a Chi-Square test statistic where we compare each cell’s observed count to its respective expected count.
In a summary table, we have \(r\times c=rc\) cells. Let \(O_1, O_2, …, O_{rc}\) denote the observed counts for each cell and \(E_1, E_2, …, E_{rc}\) denote the respective expected counts for each cell.
Under the null hypothesis and certain conditions (discussed below), the test statistic follows a Chi-Square distribution with degrees of freedom equal to \((r-1)(c-1)\), where \(r\) is the number of rows and \(c\) is the number of columns. We leave out the mathematical details to show why this test statistic is used and why it follows a Chi-Square distribution.
As we have done with other statistical tests, we make our decision by either comparing the value of the test statistic to a critical value (rejection region approach) or by finding the probability of getting this test statistic value or one more extreme (p-value approach).
The critical value for our Chi-Square test is \(\chi^2_{\alpha}\) with degree of freedom =\((r - 1) (c -1)\), while the p-value is found by \(P(\chi^2>\chi^{2*})\) with degrees of freedom =\((r - 1)(c - 1)\).
Example 8.3 (Cont’d: Chi-Square)
Let’s apply the Chi-Square Test of Independence to our example where we have a random sample of 500 U.S. adults who are questioned regarding their political affiliation and opinion on a tax reform bill. We will test if the political affiliation and their opinion on a tax reform bill are dependent at a 5% level of significance. Calculate the test statistic.
Answer
The contingency table (political_affiliation.csv) is given below. Each cell contains the observed count and the expected count in parentheses. For example, there were 138 Democrats who favored the tax bill. The expected count under the null hypothesis is 115.14. Therefore, the cell is displayed as 138 (115.14).
Favor | Indifferent | Opposed | Total | |
---|---|---|---|---|
Democrat | 138 (115.14) | 83 (85.5) | 64 (84.36) | 285 |
Republican | 64 (86.86) | 67 (64.50) | 84 (63.64) | 215 |
Total | 202 | 150 | 148 | 500 |
Calculating the test statistic by hand:
…with degrees for freedom equal to \((2 - 1)(3 - 1) = 2\).
Minitab: Chi-Square Test of Independence
To perform the Chi-Square test in Minitab…
- Choose Stat > Tables > Chi-Square Test for Association
- If you have summarized data (i.e., observed count) from the drop-down box ‘Summarized data in a two-way table.’ Select and enter the columns that contain the observed counts, otherwise, if you have the raw data use ‘Raw data’ (categorical variables). Note that if using the raw data your data will need to consist of two columns: one with the explanatory variable data (goes in the ‘row’ field) and the response variable data (goes in the ‘column’ field).
- Labeling (Optional) When using the summarized data you can label the rows and columns if you have the variable labels the in columns of the worksheet. For example, if we have a column with the two political party affiliations and a column with the three opinion choices we could use these columns to label the output.
- Select the Statistics tab. Keep checked the four boxes already checked, but also check the box for ‘Each cell’s contribution to the chi-square.’ Choose OK.
- Choose OK.
The following is the Minitab output for this example.
Rows: Worksheet rows Columns: Worksheet columns
Favor | Indifferent | Opposed | All | |
---|---|---|---|---|
1 | 138 | 83 | 64 | 285 |
115.14 | 85.50 | 84.36 | ||
4.5386 | 0.0731 | 4.9138 | ||
2 | 64 | 67 | 84 | 215 |
86.86 | 64.50 | 63.64 | ||
6.0163 | 0.0969 | 6.5137 | ||
All | 202 | 150 | 148 | 500 |
Chi-Square Test
Chi-Square | DF | P-Value | |
---|---|---|---|
Pearson | 22.152 | 2 | 0.000 |
Likelihood Ratio | 22.339 | 2 | 0.000 |
The Chi-Square test statistic is 22.152 and calculated by summing all the individual cell’s Chi-Square contributions:
\[4.584 + 0.073 + 4.914 + 6.016 + 0.097 + 6.532 = 22.152\]
The p-value is found by \(P(X^2>22.152)\) with degrees of freedom =\((2-1)(3-1) = 2\).
Minitab calculates this p-value to be less than 0.001 and reports it as 0.000. Given this p-value of 0.000 is less than the alpha of 0.05, we reject the null hypothesis that political affiliation and their opinion on a tax reform bill are independent. We conclude that there is evidence that the two variables are dependent (i.e., that there is an association between the two variables).
Condition for Using the Chi-Square Test
Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the Chi-Square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total Chi-Square value.
Example 8.4 (Tire Quality)
The operations manager of a company that manufactures tires wants to determine whether there are any differences in the quality of work among the three daily shifts. She randomly selects 496 tires and carefully inspects them. Each tire is either classified as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The two categorical variables of interest are the shift and condition of the tire produced. The data (shift_quality.txt) can be summarized by the accompanying two-way table. Does the data provide sufficient evidence at the 5% significance level to infer that there are differences in quality among the three shifts?
Perfect | Satisfactory | Defective | Total | |
---|---|---|---|---|
Shift 1 | 106 | 124 | 1 | 231 |
Shift 2 | 67 | 85 | 1 | 153 |
Shift 3 | 37 | 72 | 3 | 112 |
Total | 210 | 281 | 5 | 496 |
Answer
Minitab output:
Chi-Square Test for Association: Worksheet rows, Worksheet columns
Rows: Worksheet rows Columns: Worksheet columns
Perfect | Satisfactory | Defective | All | |
---|---|---|---|---|
1 | 106 | 124 | 1 | 231 |
97.80 | 130.87 | 2.33 | ||
2 | 67 | 85 | 1 | 153 |
64.78 | 86.68 | 1.54 | ||
3 | 37 | 72 | 3 | 112 |
47.42 | 63.45 | 1.13 | ||
All | 210 | 281 | 5 | 496 |
Chi-Square Test
Chi-Square | DF | P-Value | |
---|---|---|---|
Pearson | 8.647 | 4 | 0.071 |
Likelihood Ratio | 8.032 | 4 | 0.090 |
3 cell(s) with expected counts less than 5
Note that there are 3 cells with expected counts less than 5.0.
In the above example, we don’t have a significant result at a 5% significance level since the p-value (0.071) is greater than 0.05. Even if we did have a significant result, we still could not trust the result, because there are 3 (33.3% of) cells with expected counts < 5.0
Try It!
A food services manager for a baseball park wants to know if there is a relationship between gender (male or female) and the preferred condiment on a hot dog. The following table summarizes the results. Test the hypothesis with a significance level of 10%.
Condiment | |||||
---|---|---|---|---|---|
Gender | Ketchup | Mustard | Relish | Total | |
Male | 15 | 23 | 10 | 48 | |
Female | 25 | 19 | 8 | 52 | |
Total | 40 | 42 | 18 | 100 |
The hypotheses are:
- \(H_0\): Gender and condiments are independent
- \(H_a\): Gender and condiments are not independent
We need to expected counts table:
Condiment | |||||
---|---|---|---|---|---|
Gender | Ketchup | Mustard | Relish | Total | |
Male | 15 (19.2) | 23 (20.16) | 10 (8.64) | 48 | |
Female | 25 (20.8) | 19 (21.84) | 8 (9.36) | 52 | |
Total | 40 | 42 | 18 | 100 |
None of the expected counts in the table are less than 5. Therefore, we can proceed with the Chi-Square test.
The test statistic is:
The p-value is found by \(P(\chi^2>\chi^{2*})=P(\chi^2>2.95)\) with (3-1)(2-1)=2 degrees of freedom. Using a table or software, we find the p-value to be 0.2288.
With a p-value greater than 10%, we can conclude that there is not enough evidence in the data to suggest that gender and preferred condiment are related.
8.2 The 2x2 Table: Test of 2 Independent Proportions
Say we have a study of two categorical variables each with only two levels. One of the response levels is considered the “success” response and the other the “failure” response. A general 2 × 2 table of the observed counts would be as follows:
Success | Failure | Total | |
---|---|---|---|
Group 1 |
A |
B |
A + B |
Group 2 |
C |
D |
C + D |
The observed counts in this table represent the following proportions:
Success | Failure | Total | |
---|---|---|---|
Group 1 | \(\hat{p}_1=\dfrac{A}{A+B}\) | \(1-\hat{p}_1\) | A + B |
Group 2 | \(\hat{p}_2=\dfrac{C}{C+D}\) | \(1-\hat{p}_2\) | C + D |
Recall from our Z-test of two proportions that our null hypothesis is that the two population proportions, \(p_1\) and \(p_2\), were assumed equal while the two-sided alternative hypothesis was that they were not equal.
This null hypothesis would be analogous to the two groups being independent.
Also, if the two success proportions are equal, then the two failure proportions would also be equal. Note as well that with our Z-test the conditions were that the number of successes and failures for each group was at least 5. That equates to the Chi-square conditions that all expected cells in a 2 × 2 table be at least 5. (Remember at least 80% of all cells need an expected count of at least 5. With 80% of 4 equal to 3.2 this means all four cells must satisfy the condition).
When we run a Chi-square test of independence on a 2 × 2 table, the resulting Chi-square test statistic would be equal to the square of the Z-test statistic (i.e., \((Z^*)^2\)) from the Z-test of two independent proportions.
Example 8.5 (Political Affiliation and Opinion)
Consider the following example where we form a 2 × 2 for the Political Party and Opinion by only considering the Favor and Opposed responses:
Favor | Opposed | Total | |
---|---|---|---|
Democrat | 138 | 64 | 202 |
Republican | 64 | 84 | 148 |
Total | 202 | 148 | 350 |
The Chi-square test produces a test statistic of 22.00 with a p-value of 0.00.
The Z-test comparing the two sample proportions of \(\hat{p}_d=\dfrac{138}{202}=0.683\) minus \(\hat{p}_r=\dfrac{64}{148}=0.432\) results in a Z-test statistic of \(4.69\) with p-value of \(0.000\).
If we square the Z-test statistic, we get \(4.69^2 = 21.99\) or \(22.00\) with rounding error.
Try It!
The condiments and gender data were condensed to consider gender and either mustard or ketchup. The manager wants to know if the proportion of males who prefer ketchup is the same as the proportion of females who prefer ketchup. Test the hypothesis two ways (1) using the Chi-square test and (2) using the z-test for independence with a significance level of 10%. Show how the two test statistics are related and compare the p-values.
Condiment | ||||
---|---|---|---|---|
Gender | Ketchup | Mustard | Total | |
Male | 15 | 23 | 38 | |
Female | 25 | 19 | 44 | |
Total | 40 | 42 | 82 |
Z-test for two proportions
The hypotheses are:
- \(H_0\colon p_1-p_2=0\)
- \(H_a\colon p_1-p_2\ne 0\)
Let males be denoted as sample one and females as sample two. Using the table, we have:
- \(n_1=38\) and \(\hat{p}_1=\dfrac{15}{38}=0.395\)
- \(n_2=44\) and \(\hat{p}_2=\dfrac{25}{44}=0.568\)
The conditions are satisfied for this test (verify for extra practice).
To calculate the test statistic, we need:
\[p^*=\dfrac{x_1+x_2}{n_1+n_2}=\dfrac{15+25}{38+44}=\dfrac{40}{82}=0.4878\]
The test statistic is:
\[\begin{align} z^*&=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\\&=\dfrac{0.395-0.568}{\sqrt{0.4878(1-0.4878)\left(\frac{1}{38}+\frac{1}{44}\right)}}\\&=-1.567\end{align}\]
The p-value is \(2P(Z<-1.567)=0.1172\).
The p-value is greater than our significance level. Therefore, there is not enough evidence in the data to suggest that the proportion of males who prefer ketchup is different than the proportion of females who prefer ketchup.
Chi-square Test for independence
The expected count table is:
Condiment | ||||
---|---|---|---|---|
Gender | Ketchup | Mustard | Total | |
Male | 15 (18.537) | 23 (19.463) | 38 | |
Female | 25 (21.463) | 19 (22.537) | 44 | |
Total | 40 | 42 | 82 |
There are no expected counts less than 5. The test statistic is:
With 1 degree of freedom, the p-value is 0.1168. The p-value is greater than our significance value. Therefore, there is not enough evidence to suggest that gender and condiments (ketchup or mustard) are related.
Comparison
The p-values would be the same without rounding errors (0.1172 vs 0.1168). The z-statistic is -1.567. The square of this value is 2.455 which is what we have (rounded) for the chi-square statistic. The conclusions are the same.
8.3 Risk, Relative Risk and Odds
In this section, we will introduce some other measures we can find using a contingency table. One of the most straightforward measures to find is the risk of any given event.
8.1 (Risk) The probability that an event will occur.
In simple terms, a risk for a group is the same as the proportion of “successes” for a particular group.
Have you ever heard a doctor tell you or a family member something similar to the following: “If you do not lose weight or get your cholesterol under control you are about five times more likely to suffer a heart attack than if you had these numbers in the normal range.” If so, how alarmed should one be? “Five times” sounds alarming!
First off, this “five times” represents what is called relative risk.
8.2 (Relative Risk) Relative risk is a ratio of the risks of two groups.
In the example described above, it would be the risk of heart attack for a person in their current condition compared to the risk of heart attack if that person were in the normal ranges. However, to truly interpret the severity of a relative risk we have to know the baseline risk.
8.3 (Baseline Risk) The baseline risk is the denominator of relative risk, i.e., the risk of the group being compared to.
In our example, this would be the risk of heart attack for the normal range. If this baseline risk is high, then a relative risk of 5 would be alarming; if the baseline risk is small, then a relative risk of 5 may not be too serious.
For instance, if the risk of a heart attack for someone in the normal range was 1 out of 10, then the risk of a heart attack for a person with the above average numbers would be five times this or 5 out of 10. That is, the person would have roughly a 50/50 chance of suffering a heart attack if they didn’t get their weight and cholesterol in check. However, if the risk of a heart attack for the normal range group was 1 out of 500, then the risk of a heart attack for a person with above average numbers would be 5 out of 500 or 0.01. The person would have about a 1% chance of a heart attack if they didn’t improve their health. In both cases the relative risk was 5, but with entirely different levels of impact. Please note this example is not meant to be interpreted that taking care of your health is not important!!!
Another measure we can find is odds.
8.4 (Odds) Odds is a ratio of the number of “successes” over the number of “failures.” It can be reported as a fraction or as “number of success: number of failures.”
Example 8.6 (Cont’d: Risk and Relative Risk)
Let’s return to our Political Party and Opinion survey data (Example 8.1).
Find the risk for either party favoring the tax bill and use these risks to find and interpret a relative risk. Also, find the odds of a democrat favoring the bill.
Favor | Indifferent | Opposed | Total | |
---|---|---|---|---|
Democrat | 138 | 83 | 64 | 285 |
Republican | 64 | 67 | 84 | 215 |
Total | 202 | 150 | 148 | 500 |
Answer
From the table, the risk of Democrats favoring the bill: \(\dfrac{138}{285}=0.484\)
The risk of Republicans favoring the bill: \(\dfrac{64}{215}=0.298\)
The relative risk that Democrats favor the bill compared to Republicans: \(\dfrac{0.484}{0.298}=1.62\)
We would interpret this relative risk as “Democrats are about 1.6 times more likely than Republicans to favor the bill (i.e.: Democrats are 60% more likely to support the bill than Republicans).”
The odds of a democrat favoring the tax bill is \(\dfrac{138}{147}\) or \(138:147\).
Try It!
Consider again our previous example comparing gender and preferred condiments. The summary table is shown below for convenience.
Condiment | ||||
---|---|---|---|---|
Gender | Ketchup | Mustard | Total | |
Male | 15 | 23 | 38 | |
Female | 25 | 19 | 44 | |
Total | 40 | 42 | 82 |
Find the risk of either gender preferring ketchup and use those risks to find and interpret the relative risk.
The risk of males preferring ketchup is \(\dfrac{15}{38}=0.395\).
The risk of females preferring ketchup is \(\dfrac{25}{44}=0.567\).
The relative risk that females prefer ketchup compared to males is: \(\dfrac{0.567}{0.395}=1.435\)
We can interpret the relative risk as…
“Females are about 1.435 times more likely to prefer ketchup on hot dogs than males.”
8.4 Lesson Summary
In this Lesson, we learned how to calculate counts under the assumption that the two categorical variables are independent. We then used these expected counts to test the hypotheses:
- \(H_0\colon\) The two variables are independent.
- \(H_a\colon\) The two variables are not independent.
We demonstrated how this test relates to our test for two proportions when the alternative is two-sided.
We also introduced the terms risk and relative risk. The calculation, as well as the interpretation, is discussed.
In the next Lesson, we will consider the case where there are two quantitative variables (quantitative response and quantitative explanatory variable). We will explore how to determine if the variables have a significant linear relationship.