2.1.1 - One Categorical Variable

2.1.1 - One Categorical Variable

Data concerning one categorical variable can be summarized using a proportion.

Proportion
\(Proportion=\dfrac{Number\;in\;the\;category}{Total\;number}\)

The symbol for a sample proportion is \(\widehat{p}\) and is read as "p-hat." The symbol for a population proportion is \(p\). 

The formula for a sample proportion may also be written as \(\widehat p = \frac{x}{n}\) where \(x\) is the number in the sample with the trait of interest and \(n\) is the sample size.

A proportion must be between 0 and 1.00.

Example: Black Cards

A standard 52-card deck contains \(26\) red cards and \(26\) black cards. What proportion of cards are black?

\(p=\dfrac{26}{52}=0.50\)

The symbol \(p\) was used because this is the proportion of all cards (i.e., the population) that are black.

Example: World Campus Undergraduate Students

In the Fall 2014 semester, there were \(82,382\) undergraduate students enrolled in Penn State. Of those, \(6,245\) were World Campus students. What proportion of all Penn State undergraduate students were World Campus students?

\(p=\dfrac{6245}{82382}=0.076\)

The symbol \(p\) was used because this is the proportion of all Penn State undergraduate students (i.e., the population) that are World Campus students.

Example: Broken Cookies

In a sample of \(30\) randomly selected packages of chocolate chip cookies, \(18\) contained broken cookies. What proportion of these selected packages had broken cookies?

\(\widehat{p}=\dfrac{18}{30}=0.60\)

These data were collected from a sample so the symbol \(\widehat{p}\) was used to denote a sample proportion. 


2.1.1.1 - Risk and Odds

2.1.1.1 - Risk and Odds

You may have heard the terms risk and odds before. They are both ways to communicate the likelihood of an event.

Risk and odds are often confused with one another. The formulas for computing risk and odds are different and their interpretations are different.

Risk

In statistics, the word risk communicates the likelihood of an event occurring. This is synonymous with probability or proportion (i.e., the formulas are the same).

Risk
The probability that an event will occur. It may be written as a decimal, a fraction, or a percent.
Risk
\(Risk= \dfrac{number \;with \;the\; outcome}{total\;number\;of\;outcomes}\)

Example: Asthma Risk

\(60\) out of \(1000\) teens have asthma.

\(risk=\dfrac{60}{1000}=0.06\)

This means that \(6\%\) of teens experience asthma.

Example: Flu Risk

\(45\) out of \(100\) children get the flu each year.

\(risk=\dfrac{45}{100}=0.45\) or \(45\%\)

Odds

Odds
Express risk by comparing the likelihood of an event happening to the likelihood it does not happen.
Odds

\(odds = \dfrac {number \;with \;the\; outcome}{number \;without \;the \;outcome}\)

OR

\(odds=\dfrac{risk}{1-risk}\)

We often interpret odds in relation to the value of 1. For example, if the odds of a game are in favor of the house 2 to 1, that means for every 2 games the house wins it will lose 1. 

Example: Passing Odds

In one large class, 850 students passed an exam while 150 students failed. Because we have the raw counts, we can use the first odds formula.

\(odds=\dfrac {number \;with \;the\; outcome}{number \;without \;the \;outcome}=\dfrac{850}{150}=5.667\)

The odds of passing were 5.667 to 1. In other words, for every 5.667 students who passed the exam there was 1 who failed.

Example: Flu Odds

The risk of a child getting the flu is \(45\%\) which can also be written as \(0.45\). Because we have the risk, we can use the second odds formula.

\(odds=\dfrac{risk}{1-risk}=\dfrac{0.45}{1-0.45}=\dfrac{0.45}{0.55}=0.818\)

The odds of a child getting the flu is \(0.818\) to \(1\).


2.1.1.2 - Visual Representations

2.1.1.2 - Visual Representations

Frequency tables, pie charts, and bar charts can all be used to display data concerning one categorical (i.e., nominal- or ordinal-level) variable. Below are descriptions for each along with some examples. At the end of this lesson you will learn how to construct each of these using Minitab.

Frequency Tables

frequency table contains the counts of how often each value occurs in the dataset. Some statistical software, such as Minitab, will use the term tally to describe a frequency table. Frequency tables are most commonly used with nominal- and ordinal-level variables, though they may also be used with interval- or ratio-level variables if there are a limited number of possible outcomes. 

In addition to containing counts, some frequency tables may also include the percent of the dataset that falls into each category, and some may include cumulative values. A cumulative count is the number of cases in that category and all previous categories. A cumulative percent is the percent in that category and all previous categories. Cumulative counts and cumulative percentages should only be presented when the data are at least ordinal-level. 

The first example is a frequency table displaying the counts and percentages for Penn State undergraduate student enrollment by campus. Because this is a nominal-level variable, cumulative values were not included.

 

Frequencies of Campus
Campus Count Percent
University Park 40,639 50.1%
Commonwealth Campuses 27,100 33.4%
PA College of Technology 4,981 6.1%
World Campus 8,360 10.3%
Total 81,080 100%

Penn State Fall 2019 Undergraduate Enrollments

 

The next example is a frequency table for an ordinal-level variable: class standing. Because ordinal-level variables have a meaningful order, we sometimes want to look at the cumulative counts or cumulative percents, which tell us the number or percent of cases at or below that level.

As an example, let's interpret the values in the "Sophomore" row. There are 22 sophomore students in this sample. There are 27 students who are sophomore or below (i.e., first-year or sophomore). In terms of percentages, 34.4% of students are sophomores and 42.2% of students are sophomores or below.

Frequencies of Class Standing
Class Standing Count Cumulative Count Percent Cumulative Percent
First-Year 5 5 7.8% 7.8%
Sophomore 22 27 34.4% 42.2%
Junior 17 44 26.6% 68.8%
Senior 20 64 31.3% 100.0%

Pie Charts

A pie chart displays data concerning one categorical variable by partitioning a circle into "slices" that represent the proportion in each category. When constructing a pie chart, pay special attention to the colors being used to ensure that it is accessible to individuals with different types of colorblindness. 

Pie Chart of Campus
Category
  •  University Park (48.5%)
  •  Commonwealth Campuses (34.9%)
  •  PA College of Technology (6.5%)
  •  World Campus (10.1%)
Penn State Fall 2017 Undergraduate Enrollments

Bar Charts

A bar chart is a graph that can be used to display data concerning one nominal- or ordinal-level variable. The bars, which may be vertical or horizontal, symbolize the number of cases in each category. Note that the bars on a bar chart are separated by spaces; this communicates that this a categorical variable. 

The first example below is a bar chart with vertical bars. The second example is a bar chart with horizontal bars. Both examples are displaying the same data. On both charts, the size of the bar represents the number of cases in that category. 

Bar Chart of Undergraduate Enrollment Campus University Park 0 10000 20000 30000 40000 CommonwealthCampuses PA College ofTechnology WorldCampus Count

Penn State Fall 2019 Undergraduate Enrollments

 

Bar Chart of Undergraduate Enrollment 0 10000 20000 30000 40000 University Park Commonwealth Campuses PA College of Technology World Campus Campus Count

Penn State Fall 2019 Undergraduate Enrollments

Considerations

Pie charts tend to work best when there are only a few categories. If a variable has many categories, a pie chart may be difficult to read. In those cases, a frequency table or bar chart may be more appropriate. Each visual display has its own strengths and weaknesses. When first starting out, you may need to make a few different types of displays to determine which most clearly communicates your data.


2.1.1.2.1 - Minitab: Frequency Tables

2.1.1.2.1 - Minitab: Frequency Tables

Minitab®  – Frequency Table

This example will use data collected from a sample of STAT 200 students. These data can be downloaded using:

WCStudentData.xlsx

To create a frequency table of the primary campus variable in Minitab:

  1. Open the data file in Minitab
  2. From the tool bar, select Stat > Tables > Tally Individual Variables
  3. Double click the variable Primary Campus in the box on the left to insert it into the Variable box on the right
  4. Under Statistics, check Counts and Percents
  5. Click OK

This should result in the following frequency table:

Tally
Primary Campus Count Percent
Commonwealth Campus 5 1.46
University Park 223 65.01
World Campus 115 33.53
N= 343  
Video Walkthrough


2.1.1.2.2 - Minitab: Pie Charts

2.1.1.2.2 - Minitab: Pie Charts

Minitab®  – Pie Chart (Raw Data)

This example will use data collected from a sample of students enrolled in online sections of STAT 200 during the Summer 2020 semester. These data can be downloaded as a CSV file:

WCStudentData.csv

To create a pie chart using raw data:

  1. Open the data file in Minitab 
  2. From the tool bar, select Graph > Pie Chart...
  3. Select Counts of Unique Values
  4. Click OK
  5. Double click the variable Primary Campus in the box on the left to insert it into the Categorical variables box on the right
  6. Click OK

This should result in the pie chart below:

Pie chart of primary campus made in Minitab
Video Walkthrough

Minitab®  – Pie Chart (Summarized Data)

In the example above, raw data were used. In other words, the data file contained one row for each case. It is also possible to use Minitab to construct a pie chart with summarized data, for example, if you have your counts in a frequency table. If this is the case, follow the steps below. This example uses the following data concerning Penn State undergraduate enrollment:

Enrollment by Campus
Campus Count
University Park 40,639
Commonwealth Campuses 27,100
PA College of Technology 4,981
World Campus 8,360

Penn State Fall 2019 Undergraduate Enrollments

 

To create a pie chart using summarized data:

  1. Enter the data into a blank Minitab worksheet with one column containing the Campus names and a second column containing the Count for each campus
  2. From the tool bar, select Graph > Pie Chart...
  3. Select Summarized Data in a Table
  4. Click OK
  5. Double click Campus in the box on the left to insert it into the Categorical variable box on the right
  6. Double click Count in the box on the left to insert it into the Summary variables box on the right
  7. Click OK

This should result in the pie chart below:

Pie chart of primary campus made in Minitab using summarized data in a table
Video Walkthrough


2.1.1.2.3 - Minitab: Bar Charts

2.1.1.2.3 - Minitab: Bar Charts

Minitab®  – Bar Chart (Raw Data)

This example will use data collected from a sample of students enrolled in online sections of STAT 200 during the Summer 2020 semester. These data can be downloaded as a CSV file:

WCStudentData.csv

To create a bar graph of the primary campus variable in Minitab:

  1. Open the data file in Minitab
  2. From the tool bar, select Graph > Bar Chart > Counts of Unique Values...
  3. Select One Variable
  4. Click OK
  5. Double click the variable Primary Campus in the box on the left to insert it into the Categorical variable box on the right
  6. Click OK

This should result in the bar graph below:

Bar chart of primary campus made using Minitab
Video Walkthrough

Minitab®  – Bar Chart (Summarized Data)

In the example above, raw data were used. In other words, the data file contained one row for each case. It is also possible to use Minitab to construct a bar chart with summarized data, for example, if you have your counts in a frequency table. If this is the case, follow the steps below. This example uses the following data concerning Penn State undergraduate enrollment:

Enrollment by Campus
Campus Count
University Park 40,639
Commonwealth Campuses 27,100
PA College of Technology 4,981
World Campus 8,360

Penn State Fall 2019 Undergraduate Enrollments

 

To create a bar chart using summarized data:

  1. Enter the data into a blank Minitab worksheet with one column containing the Campus names and a second column containing the Count for each campus
  2. From the tool bar, select Graph > Bar Chart > Summarized Data in a Table...
  3. Under One Column of Values, select Simple
  4. Click OK
  5. Double click Count in the box on the left to insert it into the Y-variable box on the right
  6. Double click Campus in the box on the left to insert it into the Categorical variable box on the right
  7. Click OK

This should result in the bar chart below:

Bar chart of enrollment made using data in a summarized table
Video Walkthrough


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility