Lesson 8: Chi-Square Test for Independence

Overview Section

Let's start by recapping what we have discussed thus far in the course and mention what remains:

The fundamentals of the sampling distributions for the sample mean and the sample proportion.
We illustrated how these sampling distributions form the basis for estimation (confidence intervals) and testing for one mean or one proportion.
Then we extended the discussion to analyzing situations for two variables; one a response and the other an explanatory. When both variables were categorical we compared two proportions; when the explanatory was categorical, and the response was quantitative, we compared two means.
Next, we will take a look at other methods and discuss how they apply to situations where:
- both variables are categorical with at least one variable with more than two levels (Chi-Square Test of Independence)
- both variables are quantitative (Linear Regression)
- the explanatory variable is categorical with more than two levels, and the response is quantitative (Analysis of Variance or ANOVA)

In this Lesson, we will examine relationships where both variables are categorical using the Chi-Square Test of Independence. We will illustrate the connection between the Chi-Square test for independence and the z-test for two independent proportions in the case where each variable has only two levels.

Going forward, keep in mind that this Chi-Square test, when significant, only provides statistical evidence of an association or relationship between the two categorical variables. Do NOT confuse this result with a correlation which refers to a linear relationship between two quantitative variables (more on this in the next lesson).

The primary method for displaying the summarization of categorical variables is called a contingency table. When we have two measurements on our subjects that are both categorical, the contingency table is sometimes referred to as a two-way table.

This terminology is derived because the summarized table consists of rows and columns (i.e., the data display goes two ways).

The size of a contingency table is defined by the number of rows times the number of columns associated with the levels of the two categorical variables. The size is notated $r\times c$, where $r$ is the number of rows of the table and $c$ is the number of columns. A cell displays the count for the intersection of a row and column. Thus the size of a contingency table also gives the number of cells for that table. For example, if we have a $2\times2$ table, then we have $2(2)=4$ cells.

Note! As we will see, these contingency tables usually include a 'total' row and a 'total' column which represent the marginal totals, i.e., the total count in each row and the total count in each column. This total row and total column are NOT included in the size of the table. The size refers to the number of levels to the actual categorical variables in the study.

Application

Political Affiliation and Opinion Section

A random sample of 500 U.S. adults is questioned regarding their political affiliation and opinion on a tax reform bill. The results of this survey are summarized in the following contingency table:

	Favor	Indifferent	Opposed	Total
Democrat	138	83	64	285
Republican	64	67	84	215
Total	202	150	148	500

The size of this table is $2\times 3$ and NOT $3\times 4$. There are only two rows of observed data for Party Affiliation and three columns of observed data for their Opinion. We define the Party Affiliation as the explanatory variable and Opinion as the response because it is more natural to analyze how one's opinion is shaped by their party affiliation than the other way around.

From here, we would want to determine if an association (relationship) exists between Political Party Affiliation and Opinion on Tax Reform Bill. That is, are the two variables dependent. We'll discuss in the next section how to approach this.

Objectives

Upon successful completion of this lesson, you should be able to:

Determine when to use the Chi-Square test for independence.
Compute expected counts for a table assuming independence.
Calculate the Chi-Square test statistic given a contingency table by hand and with technology.
Conduct the Chi-Square test for independence.
Explain how the Chi-Square test for independence is related to the hypothesis test for two independent proportions.
Calculate and interpret risk and relative risk.