6: Categorical Data Comparisons

 Case Study: Entrepreneurialism

As a town planner, Donna is always thinking about ways in which the economy of her town might grow. She starts thinking about her town in the Northeast and how enthusiastic residence of her town are for starting and supporting small businesses. This is called entrepreneurialism. She begins to wonder the increases in supporting the start-up of small businesses (entrepreneurialism) is a growing trend across the country or if this is something unique to her own town. She decides to compare levels of entrepreneurialism between her town and a town in the Midwest to see if location makes a difference. Her measure of entrepreneurialism categorizes respondents as “high” or “low”. Donna recognizes that her data is categorical but is not quite sure how to proceed from there.

When we need to represent two categorical variables, such as location and categories of Entrepreneurialism, a table, called a contingency table, is typically the best format.

Location Low Entrepreneurialism High Entrepreneurialism

This table is referred to as a “2X2” contingency table because there are two categories of each variable. If Donna added a third region, say the South, she would have a 2X3 table. Donna can add her data from the table,

Location Low Entrepreneurialism High Entrepreneurialism
Northeast 300 460
Midwest 249 95

Now Donna can see her data in a table form. Let’s take a look at some of the way’s Donna can begin to describe and analyze her data.


Upon completion of this lesson, you should be able to:

  • Recognize applications dealing with multiple categorical variables 
  • Construct a 2X2 contingency table 
  • Interpret row and column totals and percentages from a contingency table 
  • State the null and alternative hypotheses for a chi square test 
  • Identify the elements of the formula for a chi square test 
  • Identify the similarity between a chi square test and a test of two proportions