11.3.2 - Minitab: Test of Independence

11.3.2 - Minitab: Test of Independence

Raw vs Summarized Data

If you have a data file with the responses for individual cases then you have "raw data" and can follow the directions below. If you have a table filled with data, then you have "summarized data." There is an example of conducting a chi-square test of independence using summarized data on a later page. After data entry the procedure is the same for both data entry methods.

Minitab®  – Chi-square Test Using Raw Data

Research question: Is there a relationship between where a student sits in class and whether they have ever cheated?

  • Null hypothesis: Seat location and cheating are not related in the population. 
  • Alternative hypothesis: Seat location and cheating are related in the population.

To perform a chi-square test of independence in Minitab using raw data:

  1. Open Minitab file: class_survey.mpx
  2. Select Stat > Tables > Chi-Square Test for Association
  3. Select Raw data (categorical variables) from the dropdown.
  4. Choose the variable Seating to insert it into the Rows box
  5. Choose the variable Ever_Cheat to insert it into the Columns box
  6. Click the Statistics button and check the boxes Chi-square test for association and Expected cell counts
  7. Click OK and OK

This should result in the following output:

Rows: Seating Columns: Ever_Cheat
  No Yes All
Back 24 8 32
  24.21 7.79  
Front 38 8 46
  34.81 11.19  
Middle 109 39 148
  111.98 36.02  
All 1714 55 226
Chi-Square Test
  Chi-Square DF P-Value
Pearson 1.539 2 0.463
Likelihood Ratio 1.626 2 0.443

Interpret

All expected values are at least 5 so we can use the Pearson chi-square test statistic. Our results are \(\chi^2 (2) = 1.539\). \(p = 0.463\). Because our \(p\) value is greater than the standard alpha level of 0.05, we fail to reject the null hypothesis. There is not enough evidence of a relationship in the population between seat location and whether a student has cheated.


11.3.2.1 - Example: Raw Data

11.3.2.1 - Example: Raw Data

Example: Dog & Cat Ownership

Is there a relationship between dog and cat ownership in the population of all World Campus STAT 200 students? Let's conduct an hypothesis test using the dataset: fall2016stdata.mpx

1. Check any necessary assumptions and write null and alternative hypotheses.

 \(H_0:\) There is not a relationship between dog ownership and cat ownership in the population of all World Campus STAT 200 students
\(H_a:\) There is a relationship between dog ownership and cat ownership in the population of all World Campus STAT 200 students

Assumption: All expected counts are at least 5. The expected counts here are 176.02, 75.98, 189.98, and 82.02, so this assumption has been met.

2. Calculate an appropriate test statistic.

Let's use Minitab to calculate the test statistic and p-value.

  1. After entering the data, select Stat > Tables > Cross Tabulation and Chi-Square
  2. Enter Dog in the Rows box
  3. Enter Cat in the Columns box
  4. Select the Chi-Square button and in the new window check the box for the Chi-square test and Expected cell counts
  5. Click OK and OK
Rows: Dog Columns: Cat
  No Yes All
No 183 69 252
  176.02 75.98  
Yes 183 89 272
  189.98 82.02  
Missing 1 0  
All 366 158 524
Chi-Square Test
  Chi-Square DF P-Value
Pearson 1.771 1 0.183
Likelihood Ratio 1.775 1 0.183

Since the assumption was met in step 1, we can use the Pearson chi-square test statistic.

\(Pearson\;\chi^2 = 1.771\)

3. Determine a p value associated with the test statistic.

\(p = 0.183\)

4. Decide between the null and alternative hypotheses.

Our p value is greater than the standard 0.05 alpha level, so we fail to reject the null hypothesis.

5. State a "real world" conclusion.

There is not enough evidence of a relationship between dog ownership and cat ownership in the population of all World Campus STAT 200 students.


11.3.2.2 - Example: Summarized Data

11.3.2.2 - Example: Summarized Data

Example: Coffee and Tea Preference

Is there a relationship between liking tea and liking coffee?

The following table shows data collected from a random sample of 100 adults. Each were asked if they liked coffee (yes or no) and if they liked tea (yes or no).

    Likes Coffee
    Yes No
Likes Tea Yes 30 25
No 10 35

Let's use the 5 step hypothesis testing procedure to address this research question.

1. Check any necessary assumptions and write null and alternative hypotheses.

 \(H_0:\) Liking coffee an liking tea are not related (i.e., independent) in the population
\(H_a:\) Liking coffee and liking tea are related (i.e., dependent) in the population

Assumption: All expected counts are at least 5.

2. Calculate an appropriate test statistic.

Let's use Minitab to calculate the test statistic and p-value.

  1. Enter the table into a Minitab worksheet as shown below:
      C1 C2 C3
      Likes Tea Likes Coffee-Yes Likes Coffee-No
    1 Yes 30 25
    2 No 10 35
  2. Select Stat > Tables > Cross Tabulation and Chi-Square
  3. Select Summarized data in a two-way table from the dropdown
  4. Enter the columns Likes Coffee-Yes and Likes Coffee-No in the Columns containing the table box
  5. For the row labels enter Likes Tea (leave the column labels blank)
  6. Select the Chi-Square button and check the boxes for Chi-square test and Expected cell counts.
  7. Click OK and OK

Output

Rows: Likes Tea  Columns: Worksheet columns
  No Yes All
Yes 30 25 55
  22 33  
No 10 35 45
  18 27  
All 40 60 100
Chi-Square Test
  Chi-Square DF P-Value
Pearson 10.774 1 0.001
Likelihood Ratio 11.138 1 0.001

Since the assumption was met in step 1, we can use the Pearson chi-square test statistic.

\(Pearson\;\chi^2 = 10.774\)

3. Determine a p value associated with the test statistic.

\(p = 0.001\)

4. Decide between the null and alternative hypotheses.

Our p value is less than the standard 0.05 alpha level, so we reject the null hypothesis.

5. State a "real world" conclusion.

There is evidence of a relationship between between liking coffee and liking tea in the population.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility