Lesson 8: ChiSquare Test for Independence
Lesson 8: ChiSquare Test for IndependenceOverview
Let's start by recapping what we have discussed thus far in the course and mention what remains:
 The fundamentals of the sampling distributions for the sample mean and the sample proportion.
 We illustrated how these sampling distributions form the basis for estimation (confidence intervals) and testing for one mean or one proportion.
 Then we extended the discussion to analyzing situations for two variables; one a response and the other an explanatory. When both variables were categorical we compared two proportions; when the explanatory was categorical, and the response was quantitative, we compared two means.
 Next, we will take a look at other methods and discuss how they apply to situations where:
 both variables are categorical with at least one variable with more than two levels (Chisquare Test of Independence)
 both variables are quantitative (Linear Regression)
 the explanatory variable is categorical with more than two levels, and the response is quantitative (Analysis of Variance or ANOVA)
In this lesson, we will examine relationships where both variables are categorical using the Chisquare Test of Independence. We will illustrate the connection between the Chisquare test for independence and the ztest for two independent proportions in the case where each variable has only two levels.
Going forward, keep in mind that this Chisquare test, when significant, only provides statistical evidence of an association or relationship between the two categorical variables. Do NOT confuse this result with correlation which refers to a linear relationship between two quantitative variables (more on this in the next lesson).
The primary method for displaying the summarization of categorical variables is called a contingency table. When we have two measurements on our subjects that are both the categorical, the contingency table is sometimes referred to as a twoway table.
This terminology is derived because the summarized table consists of rows and columns (i.e., the data display goes two ways).
The size of a contingency table is defined by the number of rows times the number of columns associated with the levels of the two categorical variables. The size is notated \(r\times c\), where \(r\) is the number of rows of the table and \(c\) is the number of columns. A cell displays the count for the intersection of a row and column. Thus the size of a contingency table also gives the number of cells for that table. For example, if we have a \(2\times2\) table, then we have \(2(2)=4\) cells.
Note! As we will see, these contingency tables usually include a 'total' row and a 'total' column which represent the marginal totals, i.e., the total count in each row and the total count in each column. This total row and total column are NOT included in the size of the table. The size refers to the number of levels to the actual categorical variables in the study.
Application
Political Affiliation and Opinion
A random sample of 500 U.S. adults is questioned regarding their political affiliation and opinion on a tax reform bill. The results of this survey are summarized in the following contingency table:
Favor  Indifferent  Opposed  Total  

Democrat 
138 
83 
64 
285 
Republican 
64 
67 
84 
215 
Total 
202 
150 
148 
500 
The size of this table is $2\times 3$ and NOT $3\times 4$. There are only two rows of observed data for Party Affiliation and three columns of observed data for their Opinion. We define the Party Affiliation as the explanatory variable and Opinion as the response because it is more natural to analyze how one's opinion is shaped by their party affiliation than the other way around.
From here, we would want to determine if an association (relationship) exists between Political Party Affiliation and Opinion on Tax Reform Bill. That is, are the two variables dependent. We discuss in the next section on how to approach this.
Objectives
 Determine when to use the ChiSquare test for independence.
 Compute expected counts for a table assuming independence.
 Calculate the ChiSquare test statistic given a contingency table by hand and with technology.
 Conduct the ChiSquare test for independence
 Explain how the ChiSquare test for independence is related to the hypothesis test for two independent proportions.
 Calculate and interpret risk and relative risk.
8.1  The ChiSquare Test for Independence
8.1  The ChiSquare Test for IndependenceHow do we test the independence of two categorical variables? It will be done using the Chisquare test of independence.
As with all prior statistical tests we need to define null and alternative hypotheses. Also, as we have learned, the null hypothesis is what is assumed to be true until we have evidence to go against it. In this lesson, we are interested in researching if two categorical variables are related or associated (i.e., dependent). Therefore, until we have evidence to suggest that they are, we must assume that they are not. This is the motivation behind the hypothesis for the Chisquare Test of Independence:
 \(H_0\): In the population, the two categorical variables are independent.
 \(H_a\): In the population, the two categorical variables are dependent.
Note! The are several ways to phrase these hypotheses. Instead of using the words "independent" and "dependent" one could say "there is no relationship between the two categorical variables" versus "there is a relationship between the two categorical variables." Or "there is no association between the two categorical variables" versus "there is an association between the two variables." The important part is that the null hypothesis refers to the two categorical variables not being related while the alternative is trying to show that they are related.
Once we have gathered our data, we summarize the data in the twoway contingency table. This table represents the observed counts and is called the Observed Counts Table or simply the Observed Table. The contingency table on the introduction page to this lesson represented the observed counts of the party affiliation and opinion for those surveyed.
The question becomes, "How would this table look if the two variables were not related?" That is, under the null hypothesis that the two variables are independent, what would we expect our data to look like?
Consider the following table:
Success  Failure  Total  

Group 1  A  B  A+B 
Group 2  C  D  C+D 
Total  A+C  B+D  A+B+C+D 
The total count is \(A+B+C+D\). Let's focus on one cell, say Group 1 and Success with observed count A. If we go back to our probability lesson, let \(G_1\) denote the event 'Group 1' and \(S\) denote the event 'Success.' Then,
\(P(G_1)=\dfrac{A+B}{A+B+C+D}\) and \(P(S)=\dfrac{A+C}{A+B+C+D}\).
Recall that if two events are independent, then their intersection is the product of their respective probabilities. In other words, if \(G_1\) and \(S\) are independent, then...
\begin{align} P(G_1\cap S)&=P(G_1)P(S)\\&=\left(\dfrac{A+B}{A+B+C+D}\right)\left(\dfrac{A+C}{A+B+C+D}\right)\\[10pt] &=\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\end{align}
If we considered counts instead of probabilities, then we get the count by multiplying the probability by the total count. In other words...
\begin{align} \text{Expected count for cell with A} &=P(G_1)P(S)\ x\ (\text{total count}) \\ &= \left(\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\right)(A+B+C+D)\\[10pt]&=\mathbf{\dfrac{(A+B)(A+C)}{A+B+C+D}} \end{align}
This is the count we would expect to see if the two variables were independent (i.e. assuming the null hypothesis is true).
 Expected Cell Count

The expected count for each cell under the null hypothesis is:
\(E=\dfrac{\text{(row total)}(\text{column total})}{\text{total sample size}}\)
Example 81: Political Affiliation and Opinion
To demonstrate, we will use the Party Affiliation and Opinion on Tax Reform example.
Observed Table:
favor  indifferent  opposed  total  

democrat  138  83  64  285 
republican  64  67  84  215 
total  202  150  148  500 
Find the expected counts for all of the cells.
We need to find what is called the Expected Counts Table or simply the Expected Table. This table displays what the counts would be for our sample data if there were no association between the variables.
Calculating Expected Counts from Observed Counts
favor  indifferent  opposed  total  

democrat  \(\frac{285(202)}{500}=115.14\)  \(\frac{285(150)}{500}=85.5\)  \(\frac{285(148)}{500}=84.36\)  285 
republican  \(\frac{215(202)}{500}=86.86\)  \(\frac{215(150)}{500}=64.5\)  \(\frac{215(148)}{500}=63.64\)  215 
total  202  150  148  500 
ChiSquare Test Statistic
To better understand what these expected counts represent, first recall that the expected counts table is designed to reflect what the sample data counts would be if the two variables were independent. Taking what we know of independent events, we would be saying that the sample counts should show a similarity on opinions of tax reform between democrats and republicans. If you find the proportion of each cell by taking a cell's expected count divided by it's row total, you will discover that in the expected table each opinion proportion is the same for democrats and republicans. That is, from the expected counts, 0.404 of the democrats and 0.404 of the republicans favor the bill; 0.3 of the democrats and 0.3 of the republicans are indifferent; and 0.296 of the democrats and 0.296 of the republicans are opposed.
The statistical question becomes, "Are the observed counts so different from the expected counts that we can conclude a relationship exists between the two variables?" To conduct this test we compute a Chisquare test statistic where we compare each cell's observed count to its respective expected count.
In a summary table, we have \(r\times c=rc\) cells. Let \(O_1, O_2, …, O_{rc}\) denote the observed counts for each cell and \(E_1, E_2, …, E_{rc}\) denote the respective expected counts for each cell.
 ChiSquare Test Statistic

The Chisquare test statistic is calculated as follows:
\(\chi^{2*}=\sum\limits_{i=1}^{rc} \frac{(O_iE_i)^2}{E_i}\)
Under the null hypothesis and certain conditions (discussed below), the test statistic follows a Chisquare distribution with degrees of freedom equal to \((r1)(c1)\), where \(r\) is the number of rows and \(c\) is the number of columns. We leave out the mathematical details to show why this test statistic is used and why it follows a Chisquare distribution.
As we have done with other statistical tests, we make our decision by either comparing the value of the test statistic to a critical value (rejection region approach) or by finding the probability of getting this test statistic value or one more extreme (pvalue approach).
The critical value for our Chisquare test is \(\chi^2_{\alpha}\) with degree of freedom =\((r  1) (c 1)\), while the pvalue is found by \(P(\chi^2>\chi^{2*})\) with degrees of freedom =\((r  1)(c  1)\).
Example 81 Cont'd: ChiSquare
Let's apply the Chisquare Test of Independence to our example where we have a random sample of 500 U.S. adults who are questioned regarding their political affiliation and opinion on a tax reform bill. We will test if the political affiliation and their opinion on a tax reform bill are dependent at a 5% level of significance. Calculate the test statistic.
The contingency table (political_affiliation.txt) is given below. Each cell contains the observed count and the expected count in parentheses. For example, there were 138 democrats who favored the tax bill. The expected count under the null hypothesis is 115.14. Therefore, the cell is displayed as 138 (115.14).
favor  indifferent  opposed  total  

democrat  138 (115.14)  83 (85.5)  64 (84.36)  285 
republican  64 (86.86)  67 (64.50)  84 (63.64)  215 
total  202  150  148  500 
Calculating the test statistic by hand:
\begin{multline} \chi^{2*}=\dfrac{(138−115.14)^2}{115.14}+\dfrac{(83−85.50)^2}{85.50}+\dfrac{(64−84.36)^2}{84.36}+\\ \dfrac{(64−86.86)^2}{86.86}+\dfrac{(67−64.50)^2}{64.50}+\dfrac{(84−63.64)^2}{63.64}=22.152\end{multline}
...with degrees for freedom equal to \((2  1)(3  1) = 2\).
Minitab: ChiSquare Test of Independence
To perform the ChiSquare test in Minitab...
 Choose Stat > Tables > ChiSquare Test for Association
 If you have summarized data (i.e., observed count) from the dropdown box 'Summarized data in a twoway table.' Select and enter the columns that contain the observed counts, otherwise, if you have the raw data use 'Raw data' (categorical variables). Note that if using the raw data your data will need to consist of two columns: one with the explanatory variable data (goes in the 'row' field) and the response variable data (goes in the 'column' field).
 Labeling (Optional) When using the summarized data you can label the rows and columns if you have the variable labels in columns of the worksheet. For example, if we have a column with the two political party affiliations and a column with the three opinion choices we could use these columns to label the output.
 Click the Statistics tab. Keep checked the four boxes already checked, but also check the box for 'Each cell's contribution to the chisquare.' Click OK .
 Click OK .
Note! If you have the observed counts in a table, you can copy/paste them into Minitab. For instance, you can copy the entire observed counts table (excluding the totals!) for our example and paste these into Minitab starting with the first empty cell of a column.
The following is the Minitab output for this example.
Cell Contents: Count, Expected count, Contribution to Chisquare
favor 
indiffer  opposed  All  

1 
138 115.14 4.5836 
83 85.50 0.0731 
64 84.36 4.9138 
285 
2 
64 86.86 6.0163 
67 64.50 0.0969 
84 63.64 6.5137 
215 
All 
202  150  148  500 
Pearson ChiSq = 4.5386 + 0.073 + 4.914 + 6.016 + 0.097 + 6.5137 = 22.152 DF = 2, PValue = 0.000
Likelihood Ratio ChiSquare
(Ignore the Fisher's pvalue! The pvalue highlighted above is calculated using the methods we learned in this lesson. More specifically, the chisquare we learned is referred to as the Pearson Chisquare. The Fisher's test uses a different method than what we explained in this lesson to calculate a test statistic and pvalue. This method incorporates a log of the ratio of observed to expected values. It's a different technique that is more complicated to do byhand. Minitab automatically includes both results in its output.)The Chisquare test statistic is 22.152 and calculated by summing all the individual cell's Chisquare contributions:
\(4.584 + 0.073 + 4.914 + 6.016 + 0.097 + 6.532 = 22.152\)
The pvalue is found by \(P(X^2>22.152)\) with degrees of freedom =\((21)(31) = 2\).
Minitab calculates this pvalue to be less than 0.001 and reports it as 0.000. Given this pvalue of 0.000 is less than the alpha of 0.05, we reject the null hypothesis that political affiliation and their opinion on a tax reform bill are independent. We conclude that there is evidence that the two variables are dependent (i.e., that there is an association between the two variables).
Conditions for Using the ChiSquare Test
Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the chisquare test if more than 20% of the cells have expected frequencies below five, especially if the pvalue is small and these cells give a large contribution to the total chisquare value.
Example 82: Tire Quality
The operations manager of a company that manufactures tires wants to determine whether there are any differences in the quality of work among the three daily shifts. She randomly selects 496 tires and carefully inspects them. Each tire is either classified as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The two categorical variables of interest are shift and condition of the tire produced. The data (shift_quality.txt) can be summarized by the accompanying twoway table. Does the data provide sufficient evidence at the 5% significance level to infer that there are differences in quality among the three shifts?
Perfect  Satisfactory  Defective  Total  

Shift 1 
106 
124 
1 
231 
Shift 2 
67 
85 
1 
153 
Shift 3 
37 
72 
3 
112 
Total 
210 
281 
5 
496 
ChiSquare Test
C1 
C2  C3  Total  
1 
106 97.80 
124 130.87 
1 2.33 
231 
2 
67 64.78 
85 86.68 
1 1.54 
153 
3 
37 47.42 
72 63.45 
3 1.13 
112 
Total  210  281  5  496 
ChiSq = 8.647 DF = 4, PValue = 0.071
Note that there are 3 cells with expected counts less than 5.0.
In the above example, we don't have a significant result at a 5% significance level since the pvalue (0.071) is greater than 0.05. Even if we did have a significant result, we still could not trust the result, because there are 3 (33.3% of) cells with expected counts < 5.0
Caution!
Sometimes researchers will categorize quantitative data (e.g., take height measurements and categorize as 'below average,' 'average,' and 'above average.'') Doing so results in a loss of information  one cannot do the reverse of taking the categories and reproducing the raw quantitative measurements. Instead of categorizing, the data should be analyzed using quantitative methods.
Try it!
A food services manager for a baseball park wants to know if there is a relationship between gender (male or female) and the preferred condiment on a hot dog. The following table summarizes the results. Test the hypothesis with a significance level of 10%.
Condiment  

Gender  Ketchup  Mustard  Relish  Total  
Male  15  23  10  48  
Female  25  19  8  52  
Total  40  42  18  100 
The hypotheses are:
 \(H_0\): Gender and condiments are independent
 \(H_a\): Gender and condiments are not independent
We need to expected counts table:
Condiment  

Gender  Ketchup  Mustard  Relish  Total  
Male  15 (19.2)  23 (20.16)  10 (8.64)  48  
Female  25 (20.8)  19 (21.84)  8 (9.36)  52  
Total  40  42  18  100 
None of the expected counts in the table are less than 5. Therefore, we can proceed with the Chisquare test.
The test statistic is:
\begin{multline} \chi^{2*}=\dfrac{(1519.2)^2}{19.2}+\dfrac{(2320.16)^2}{20.16}+\dfrac{(108.64)^2}{8.64}+\\\dfrac{(2520.8)^2}{20.8}+\dfrac{(1921.84)^2}{21.84}+\dfrac{(89.36)^2}{9.36}=2.95\end{multline}
The pvalue is found by \(P(\chi^2>\chi^{2*})=P(\chi^2>2.95)\) with (31)(21)=2 degrees of freedom. Using a table or software, we find the pvalue to be 0.2288.
With a pvalue greater than 10%, we can conclude that there is not enough evidence in the data to suggest that gender and preferred condiment are related.
8.2  The 2x2 Table: Test of 2 Independent Proportions
8.2  The 2x2 Table: Test of 2 Independent ProportionsSay we have study of two categorical variables each with only two levels. One of the response levels is considered the "success" response and the other the "failure" response. A general 2 × 2 table of the observed counts would be as follows:
Success  Failure  Total  

Group 1 
A 
B 
A + B 
Group 2 
C 
D 
C + D 
The observed counts in this table represent the following proportions:
Success  Failure  Total  

Group 1 
\(\hat{p}_1=\frac{A}{A+B}\) 
\(1\hat{p}_1\) 
A + B 
Group 2 
\(\hat{p}_2=\frac{C}{C+D}\) 
\(1\hat{p}_2\) 
C + D 
Recall from our Ztest of two proportions that our null hypothesis is that the two population proportions, \(p_1\) and \(p_2\), were assumed equal while the twosided alternative hypothesis was that they were not equal.
This null hypothesis would be analogous to the two groups being independent.
Also, if the two success proportions are equal, then the two failure proportions would also be equal. Note as well that with our Ztest the conditions were that the number of successes and failures for each group was at least 5. That equates to the Chisquare conditions that all expected cells in a 2 × 2 table be at least 5. (Remember at least 80% of all cells need an expected count of at least 5. With 80% of 4 equal to 3.2 this means all four cells must satisfy the condition).
When we run a Chisquare test of independence on a 2 × 2 table, the resulting Chisquare test statistic would be equal to the square of the Ztest statistic (i.e., \((Z^*)^2\)) from the Ztest of two independent proportions.
Application
Political Affiliation and Opinion
Consider the following example where we form a 2 × 2 for the Political Party and Opinion by only considering the Favor and Opposed responses:
favor  oppose  Total  

democrat 
138 
64 
202 
republican 
64 
84 
148 
Total 
202 
148 
350 
The Chisquare test produces a test statistic of 22.00 with pvalue 0.00
The Ztest comparing the two sample proportions of \(\hat{p}_d=\frac{138}{202}=0.683\) minus \(\hat{p}_r=\frac{64}{148}=0.432\) results in a Ztest statistic of \(4.69\) with pvalue of \(0.000\).
If we square the Ztest statistic, we get \(4.69^2 = 21.99\) or \(22.00\) with rounding error.
Try it!
The condiments and gender data was condensed to consider gender and either mustard or ketchup. The manager wants to know if the proportion of males that prefer ketchup is the same as the proportion of females that prefer ketchup. Test the hypothesis two ways (1) using the Chisquare test and (2) using the ztest for independence with significance level of 10%. Show how the two test statistics are related and compare the pvalues.
Condiment  

Gender  Ketchup  Mustard  Total  
Male  15  23  38  
Female  25  19  44  
Total  40  42  82 
Ztest for two proportions
The hypotheses are:
\(H_0\colon p_1p_2=0\)
\(H_a\colon p_1p_2\ne 0\)
Let males be denoted as sample one and females as sample two. Using the table, we have:
\(n_1=38\) and \(\hat{p}_1=\frac{15}{38}=0.395\)
\(n_2=44\) and \(\hat{p}_2=\frac{25}{44}=0.568\)
The conditions are satisfied for this test (verify for extra practice).
To calculate the test statistic, we need:
\(p^*=\dfrac{x_1+x_2}{n_1+n_2}=\dfrac{15+25}{38+44}=\dfrac{40}{82}=0.4878\)
The test statistic is:
\begin{align} z^*&=\dfrac{\hat{p}_1\hat{p}_20}{\sqrt{p^*(1p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\\&=\dfrac{0.3950.568}{\sqrt{0.4878(10.4878)\left(\frac{1}{38}+\frac{1}{44}\right)}}\\&=1.567\end{align}
The pvalue is \(2P(Z<1.567)=0.1172\).
The pvalue is greater than our significance level. Therefore, there is not enough evidence in the data to suggest that the proportion of males that prefer ketchup is different than the proportion of females that prefer ketchup.
Chisquare Test for independence
The expected count table is:
Condiment  

Gender  Ketchup  Mustard  Total  
Male  15 (18.537)  23 (19.463)  38  
Female  25 (21.463)  19 (22.537)  44  
Total  40  42  82 
There are no expected counts less than 5. The test statistic is:
\(\chi^{2*}=\dfrac{(1518.537)^2}{18.537}+\dfrac{(2319.463)^2}{19.463}+\dfrac{(2521.463)^2}{21.463}+\dfrac{(1922.537)^2}{22.537}=2.46 \)
With 1 degree of freedom, the pvalue is 0.1168. The pvalue is greater than our significance value. Therefore, there is not enough evidence to suggest that gender and condiments (ketchup or mustard) are related.
Comparison
The pvalues would be the same without rounding errors (0.1172 vs 0.1168). The zstatistic is 1.567. The square of this value is 2.455 which is what we have (rounded) for the chisquare statistic. The conclusions are the same.
8.3  Risk, Relative Risk and Odds
8.3  Risk, Relative Risk and OddsRisk
In this section, we will introduce some other measures we can find using a contingency table. One of the most straightforward measures to find is the risk of any given event.
 Risk
 The probability that an event will occur.
In simple terms, a risk for a group is the same as the proportion of "success" for a particular group.
Relative Risk
Have you ever heard a doctor tell you or a family member something similar to the following: "If you do not lose weight or get your cholesterol under control you are about five times more likely to suffer a heart attack than if you had these numbers in the normal range." If so, how alarmed should one be? "Five times" sounds alarming!
First off, this "five times" represents what is called relative risk.
 Relative risk
 Relative risk is a ratio of the risks of two groups.
In the example described above, it would be the risk of heart attack for a person in their current condition compared to the risk of heart attack if that person were in the normal ranges. However, to truly interpret the severity of a relative risk we have to know the baseline risk.
 Baseline Risk
 The baseline risk is the denominator of relative risk, i.e., the risk of the group being compared to.
In our example, this would be the risk of heart attack for the normal range. If this baseline risk is high, then a relative risk of 5 would be alarming; if the baseline risk is small, then a relative risk of 5 may not be too serious.
For instance, if the risk of a heart attack for someone in the normal range was 1 out of 10, then the risk of a heart attack for a person with the above average numbers would be five times this or 5 out of 10. That is, the person would have roughly a 50/50 chance of suffering a heart attack if they didn't get their weight and cholesterol in check. However, if the risk of a heart attack for the normal range group was 1 out of 500, then the risk of a heart attack for a person with above average numbers would be 5 out of 500 or 0.01. The person would have about a 1% chance of a heart attack if they didn't improve their health. In both cases the relative risk was 5, but with entirely different levels of impact. Please note this example is not meant to be interpreted that taking care of your health is not important!!!
Another measure we can find is odds.
 Odds
 Odds is a ratio of the number of “success” over the number of “failures.” It can be reported as a fraction or as “number of success: number of failures.”
Example 81 Cont'd: Risk and Relative Risk
If we return to our Political Party and Opinion survey data, find the risk for either party favoring the tax bill and use these risks to find and interpret a relative risk. Also, find the odds of a democrat favoring the bill.
favor  indifferent  opposed  total  

democrat  138  83  64  285 
republican  64  67  84  215 
total  202  150  148  500 
From the table, the risk of democrats favoring the bill: \(\dfrac{138}{285}=0.484\)
The risk of republicans favoring the bill: \(\dfrac{64}{215}=0.298\)
The relative risk of that democrats favor the bill compared to republicans: \(\dfrac{0.484}{0.298}=1.62\)
We would interpret this relative risk as, "Democrats are about 1.6 times more likely than Republicans to favor the tax reform bill."
The odds of a democrat favoring the tax bill is \(\frac{138}{147}\) or \(138:147\).
Try it!
Consider again our previous example comparing gender and preferred condiments. The summary table is shown below for convenience.
Condiment  

Gender  Ketchup  Mustard  Total  
Male  15  23  38  
Female  25  19  44  
Total  40  42  82 
Find the risk of either gender preferring ketchup and use those risks to find and interpret the relative risk.
The risk of males preferring ketchup is \(\frac{15}{38}=0.395\).
The risk of females preferring ketchup is \(\frac{25}{44}=0.567\).
The relative risk that females prefer ketchup compared to males is: \(\frac{0.567}{0.395}=1.435\)
We can interpret the relative risk as...
“Females are about 1.435 times more likely to prefer ketchup on hot dogs than males.”
8.4  Lesson 8 Summary
8.4  Lesson 8 SummaryIn this Lesson, we learned how to calculate counts under the assumption that the two categorical variables are independent. We then used these expected counts to test the hypotheses:
 \(H_0\colon\)The two variables are independent
 \(H_a\colon\) The two variables are not independent
We demonstrated how this test relates to our test for two proportions when the alternative is twosided.
We also introduced the terms risk and relative risk. The calculation, as well as the interpretation, is discussed.
In this next Lesson, we will consider the case where there are two quantitative variables (quantitative response and quantitative explanatory variable). We will explore how to determine if the variables have a significant linear relationship.