2.1.2 - Two Categorical Variables

2.1.2 - Two Categorical Variables

Data concerning two categorical (i.e., nominal- or ordinal-level) variables can be displayed in a two-way contingency table, clustered bar chart, or stacked bar chart. Here, we'll look at an example of each. At the end of this lesson, you will learn how Minitab Express can be used to make two-way contingency tables and clustered bar charts. Minitab Express cannot be used to make stacked bar charts.

Two-Way Contingency Table

two-way contingency table, also know as a two-way table or just contingency table, displays data from two categorical variables. This is similar to the frequency tables we saw in the last lesson, but with two dimensions. One variable will be represented in the rows and a second variable will be represented in the columns. Later in this lesson we'll see how a two-way table can be used to compute a variety of different proportions.

The example below displays the counts of Penn State undergraduate and graduate students who are Pennsylvania residents and not Pennsylvania residents.

Two-Way Table of Penn State Enrollment by Academic Level & State Residency
  PA Resident Non-PA Resident Total
Undergraduate 54,239 26,841 81,080
Graduate 5,596 9,732 15,328
Total 59,835 36,573 96,408

Stacked Bar Chart

stacked bar chart is also known as a segmented bar chart. One categorical variable is represented on the x-axis and the second categorical variable is displayed as different parts (i.e., segments) of each bar. Minitab Express cannot be used to construct stacked bar charts, however many other software programs will. The stacked bar chart below was constructed using the statistical software program R.

Stacked bar chart of Penn State enrollment by academic level and state residency

On this stacked bar chart, the bar on the left represents the number of students who are Pennsylvania residents. The bar on the right represents the number of students who are not Pennsylvania residents. The bottom of each bar, which is light green, represents the number of students who are enrolled at the undergraduate-level. The top of each bar, which is blue, represents the number of students who are enrolled at the graduate-level. 

From this bar chart, we can see that overall there are more students who are Pennsylvania residents than non-Pennsylvania residents because the bar on the left is higher than the bar on the right. In both bars, the light green section is much bigger than the blue section, which tells us that there are more undergraduate-students than there are graduate-students in both groups.

The light green section is bigger in the left bar compared to the right bar, which tells us that undergraduate-students are more likely to be Pennsylvania residents. The blue section is bigger in the right bar compared to the left bar, which tells us that graduate-students are more likely to be non-Pennsylvania residents.

Clustered Bar Chart

In a clustered bar chart each bar represents one combination of the two categorical variables. If you compare this to the two-way contingency table above, each bar represents the value in one cell. This is also known as a side-by-side bar chart. The clustered bar chart below was made using Minitab Express.

Clustered bar chart of Penn State enrollment by academic level and state residency

Choosing the Best Visual Display

The two-way contingency table, stacked bar chart, and clustered bar chart shown above were all made using the same data concerning Penn State enrollments by academic level and state residency. The best visual display depends on the scenario. For example, if our primary goal was to compare the number of students who are Pennsylvania residents and non-Pennsylvania residents, and academic level was a secondary variable of interest, the stacked bar chart may be preferred. If we wanted to compare the number of students in each combination of academic level and state residency to see which groups were largest and smallest, the clustered bar chart may be preferred. Often, more than one of these graphs may be appropriate. 


2.1.2.1 - Minitab Express: Two-Way Table

2.1.2.1 - Minitab Express: Two-Way Table

MinitabExpress  – Two-Way Table

This example will use data collected from a sample of students enrolled in online sections of STAT 200 during the Summer 2020 semester. These data can be downloaded as a Minitab Express Project or as a CSV file:

WCStudentData.mpjx

WCStudentData.csv

To create a two-way table of the Work Status and Primary Campus variables in Minitab Express:

  1. Open the data set in Minitab Express
  2. On a PC: Select STATISTICS > Cross Tabulation and Chi-square
    On a Mac: Select Statistics > Tables > Cross Tabulation and Chi-Square
  3. We have a data file where each row represents one case, so we will keep the default data entry method of Raw data (categorical variables) in the drop down menu
  4. Double click the variable Work Status in the box on the left to insert it into the Rows box on the right
  5. Double click the variable Primary Campus in the box on the left to insert it into the Columns box on the right
  6. Click OK

This should result in the two-way table below:

Tabulated Statistics: Work Status, Primary Campus
Rows: Work Status | Columns: Primary Campus
  Commonwealth Campus University Park World Campus All
Full-time 0 26 78 104
Not working 1 99 25 125
Missing 0 2 0  
All 5 221 115 341
Cell Contents: Count  
Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.


2.1.2.2 - Minitab Express: Clustered Bar Chart

2.1.2.2 - Minitab Express: Clustered Bar Chart

MinitabExpress  – Clustered Bar Chart

This example will use data collected from a sample of students enrolled in online sections of STAT 200 during the Summer 2020 semester. These data can be downloaded as a Minitab Express Project or as a CSV file:

WCStudentData.mpjx

WCStudentData.csv

To create a clustered bar chart of the Work Status and Primary Campus variables in Minitab Express:

  1. Open the data set in Minitab Express
  2. On a PC: Select GRAPHS> Bar Chart > Counts of  Unique Values > Clustered
    On a Mac: Select Graphs > Bar Chart > Counts of unique values in a categorical variable > Clustered
  3. Double click the variables Work Status and Primary Campus in the box on the left to insert them both into the Categorical variables box on the right
  4. Click OK

This should result in the clustered bar chart below:

Clustered bar chart for work status and primary campus

 

Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.

Note: The order in which the variables are entered into the Categorical variables box in Minitab Express determines how the bars will be clustered. For example, if we entered Primary Campus and then Work Status, the result would be the following clustered bar chart:

Clustered bar chart for primary campus and work status

 

Summarized Data

In the example above, raw data were used. In other words, our Minitab Express file contained one row for each case. It is also possible to use Minitab Express to construct a clustered bar chart with summarized data, for example, if you have data in a frequency table. To do this, on a PC select GRAPHS > Bar Chart > Summarized Data > Data in a Two-Way Table > Clustered. On a Mac, select Graphs > Bar Chart > Summarized values for each category in a table >Two-way table > Clustered. For an example, see the Minitab Express Support page.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility