2.1.1 - One Categorical Variable

Data concerning one categorical variable can be summarized using a proportion.

Proportion: \(Proportion=\dfrac{Number\;in\;the\;category}{Total\;number}\)

The symbol for a sample proportion is \(\widehat{p}\) and is read as "p-hat." The symbol for a population proportion is \(p\).

The formula for a sample proportion may also be written as \(\widehat p = \frac{x}{n}\) where \(x\) is the number in the sample with the trait of interest and \(n\) is the sample size.

A proportion must be between 0 and 1.00.

Example: Black Cards

A standard 52-card deck contains \(26\) red cards and \(26\) black cards. What proportion of cards are black?

\(p=\dfrac{26}{52}=0.50\)

The symbol \(p\) was used because this is the proportion of all cards (i.e., the population) that are black.

Example: World Campus Undergraduate Students

In the Fall 2014 semester, there were \(82,382\) undergraduate students enrolled in Penn State. Of those, \(6,245\) were World Campus students. What proportion of all Penn State undergraduate students were World Campus students?

\(p=\dfrac{6245}{82382}=0.076\)

The symbol \(p\) was used because this is the proportion of all Penn State undergraduate students (i.e., the population) that are World Campus students.

Example: Broken Cookies

In a sample of \(30\) randomly selected packages of chocolate chip cookies, \(18\) contained broken cookies. What proportion of these selected packages had broken cookies?

\(\widehat{p}=\dfrac{18}{30}=0.60\)

These data were collected from a sample so the symbol \(\widehat{p}\) was used to denote a sample proportion.

2.1.1.1 - Risk and Odds

You may have heard the terms risk and odds before. They are both ways to communicate the likelihood of an event.

Risk and odds are often confused with one another. The formulas for computing risk and odds are different and their interpretations are different.

Risk

In statistics, the word risk communicates the likelihood of an event occurring. This is synonymous with probability or proportion (i.e., the formulas are the same).

Risk: The probability that an event will occur. It may be written as a decimal, a fraction, or a percent.

Risk: \(Risk= \dfrac{number \;with \;the\; outcome}{total\;number\;of\;outcomes}\)

Example: Asthma Risk

\(60\) out of \(1000\) teens have asthma.

\(risk=\dfrac{60}{1000}=0.06\)

This means that \(6\%\) of teens experience asthma.

Example: Flu Risk

\(45\) out of \(100\) children get the flu each year.

\(risk=\dfrac{45}{100}=0.45\) or \(45\%\)

Odds

Odds: Express risk by comparing the likelihood of an event happening to the likelihood it does not happen.

Odds

\(odds = \dfrac {number \;with \;the\; outcome}{number \;without \;the \;outcome}\)

\(odds=\dfrac{risk}{1-risk}\)

We often interpret odds in relation to the value of 1. For example, if the odds of a game are in favor of the house 2 to 1, that means for every 2 games the house wins it will lose 1.

Example: Passing Odds

In one large class, 850 students passed an exam while 150 students failed. Because we have the raw counts, we can use the first odds formula.

\(odds=\dfrac {number \;with \;the\; outcome}{number \;without \;the \;outcome}=\dfrac{850}{150}=5.667\)

The odds of passing were 5.667 to 1. In other words, for every 5.667 students who passed the exam there was 1 who failed.

Example: Flu Odds

The risk of a child getting the flu is \(45\%\) which can also be written as \(0.45\). Because we have the risk, we can use the second odds formula.

\(odds=\dfrac{risk}{1-risk}=\dfrac{0.45}{1-0.45}=\dfrac{0.45}{0.55}=0.818\)

The odds of a child getting the flu is \(0.818\) to \(1\).

2.1.1.2 - Visual Representations

Frequency tables, pie charts, and bar charts can all be used to display data concerning one categorical (i.e., nominal- or ordinal-level) variable. Below are descriptions for each along with some examples. At the end of this lesson you will learn how to construct each of these using Minitab.

Frequency Tables

A frequency table contains the counts of how often each value occurs in the dataset. Some statistical software, such as Minitab, will use the term tally to describe a frequency table. Frequency tables are most commonly used with nominal- and ordinal-level variables, though they may also be used with interval- or ratio-level variables if there are a limited number of possible outcomes.

In addition to containing counts, some frequency tables may also include the percent of the dataset that falls into each category, and some may include cumulative values. A cumulative count is the number of cases in that category and all previous categories. A cumulative percent is the percent in that category and all previous categories. Cumulative counts and cumulative percentages should only be presented when the data are at least ordinal-level.

The first example is a frequency table displaying the counts and percentages for Penn State undergraduate student enrollment by campus. Because this is a nominal-level variable, cumulative values were not included.

Frequencies of Campus
Campus	Count	Percent
University Park	40,639	50.1%
Commonwealth Campuses	27,100	33.4%
PA College of Technology	4,981	6.1%
World Campus	8,360	10.3%
Total	81,080	100%

Penn State Fall 2019 Undergraduate Enrollments

The next example is a frequency table for an ordinal-level variable: class standing. Because ordinal-level variables have a meaningful order, we sometimes want to look at the cumulative counts or cumulative percents, which tell us the number or percent of cases at or below that level.

As an example, let's interpret the values in the "Sophomore" row. There are 22 sophomore students in this sample. There are 27 students who are sophomore or below (i.e., first-year or sophomore). In terms of percentages, 34.4% of students are sophomores and 42.2% of students are sophomores or below.

Frequencies of Class Standing
Class Standing	Count	Cumulative Count	Percent	Cumulative Percent
First-Year	5	5	7.8%	7.8%
Sophomore	22	27	34.4%	42.2%
Junior	17	44	26.6%	68.8%
Senior	20	64	31.3%	100.0%

Pie Charts

A pie chart displays data concerning one categorical variable by partitioning a circle into "slices" that represent the proportion in each category. When constructing a pie chart, pay special attention to the colors being used to ensure that it is accessible to individuals with different types of colorblindness.

Pie Chart of Campus

Bar Charts

A bar chart is a graph that can be used to display data concerning one nominal- or ordinal-level variable. The bars, which may be vertical or horizontal, symbolize the number of cases in each category. Note that the bars on a bar chart are separated by spaces; this communicates that this a categorical variable.

The first example below is a bar chart with vertical bars. The second example is a bar chart with horizontal bars. Both examples are displaying the same data. On both charts, the size of the bar represents the number of cases in that category.

Penn State Fall 2019 Undergraduate Enrollments

Considerations

Pie charts tend to work best when there are only a few categories. If a variable has many categories, a pie chart may be difficult to read. In those cases, a frequency table or bar chart may be more appropriate. Each visual display has its own strengths and weaknesses. When first starting out, you may need to make a few different types of displays to determine which most clearly communicates your data.

2.1.1.2.1 - Minitab: Frequency Tables

Minitab^® – Frequency Table

This example will use data collected from a sample of STAT 200 students. These data can be downloaded using:

WCStudentData.xlsx

To create a frequency table of the primary campus variable in Minitab:

Open the data file in Minitab
From the tool bar, select Stat > Tables > Tally Individual Variables
Double click the variable Primary Campus in the box on the left to insert it into the Variable box on the right
Under Statistics, check Counts and Percents
Click OK

This should result in the following frequency table:

Tally
Primary Campus	Count	Percent
Commonwealth Campus	5	1.46
University Park	223	65.01
World Campus	115	33.53
N=	343

Video Walkthrough

2.1.1.2.2 - Minitab: Pie Charts

Minitab^® – Pie Chart (Raw Data)

This example will use data collected from a sample of students enrolled in online sections of STAT 200 during the Summer 2020 semester. These data can be downloaded as a CSV file:

WCStudentData.csv

To create a pie chart using raw data:

Open the data file in Minitab
From the tool bar, select Graph > Pie Chart...
Select Counts of Unique Values
Click OK
Double click the variable Primary Campus in the box on the left to insert it into the Categorical variables box on the right
Click OK

This should result in the pie chart below:

Pie chart of primary campus made in Minitab

Video Walkthrough

Minitab^® – Pie Chart (Summarized Data)

In the example above, raw data were used. In other words, the data file contained one row for each case. It is also possible to use Minitab to construct a pie chart with summarized data, for example, if you have your counts in a frequency table. If this is the case, follow the steps below. This example uses the following data concerning Penn State undergraduate enrollment:

Enrollment by Campus
Campus	Count
University Park	40,639
Commonwealth Campuses	27,100
PA College of Technology	4,981
World Campus	8,360

Penn State Fall 2019 Undergraduate Enrollments

To create a pie chart using summarized data:

Enter the data into a blank Minitab worksheet with one column containing the Campus names and a second column containing the Count for each campus
From the tool bar, select Graph > Pie Chart...
Select Summarized Data in a Table
Click OK
Double click Campus in the box on the left to insert it into the Categorical variable box on the right
Double click Count in the box on the left to insert it into the Summary variables box on the right
Click OK

This should result in the pie chart below:

Pie chart of primary campus made in Minitab using summarized data in a table

Video Walkthrough

2.1.1.2.3 - Minitab: Bar Charts

Minitab^® – Bar Chart (Raw Data)

This example will use data collected from a sample of students enrolled in online sections of STAT 200 during the Summer 2020 semester. These data can be downloaded as a CSV file:

WCStudentData.csv

To create a bar graph of the primary campus variable in Minitab:

Open the data file in Minitab
From the tool bar, select Graph > Bar Chart > Counts of Unique Values...
Select One Variable
Click OK
Double click the variable Primary Campus in the box on the left to insert it into the Categorical variable box on the right
Click OK

This should result in the bar graph below:

Bar chart of primary campus made using Minitab

Video Walkthrough

Minitab^® – Bar Chart (Summarized Data)

In the example above, raw data were used. In other words, the data file contained one row for each case. It is also possible to use Minitab to construct a bar chart with summarized data, for example, if you have your counts in a frequency table. If this is the case, follow the steps below. This example uses the following data concerning Penn State undergraduate enrollment:

Enrollment by Campus
Campus	Count
University Park	40,639
Commonwealth Campuses	27,100
PA College of Technology	4,981
World Campus	8,360

Penn State Fall 2019 Undergraduate Enrollments

To create a bar chart using summarized data:

Enter the data into a blank Minitab worksheet with one column containing the Campus names and a second column containing the Count for each campus
From the tool bar, select Graph > Bar Chart > Summarized Data in a Table...
Under One Column of Values, select Simple
Click OK
Double click Count in the box on the left to insert it into the Y-variable box on the right
Double click Campus in the box on the left to insert it into the Categorical variable box on the right
Click OK

This should result in the bar chart below:

Bar chart of enrollment made using data in a summarized table

Video Walkthrough

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility

2.1.1 - One Categorical Variable

Example: Black Cards

Example: World Campus Undergraduate Students

Example: Broken Cookies

2.1.1.1 - Risk and Odds

Example: Asthma Risk

Example: Flu Risk

Odds

Example: Passing Odds

Example: Flu Odds

2.1.1.2 - Visual Representations

Frequency Tables

Pie Charts

Bar Charts

Considerations

2.1.1.2.1 - Minitab: Frequency Tables

Minitab® – Frequency Table

2.1.1.2.2 - Minitab: Pie Charts

Minitab® – Pie Chart (Raw Data)

Minitab® – Pie Chart (Summarized Data)

2.1.1.2.3 - Minitab: Bar Charts

Minitab® – Bar Chart (Raw Data)

Minitab® – Bar Chart (Summarized Data)

Minitab^® – Frequency Table

Minitab^® – Pie Chart (Raw Data)

Minitab^® – Pie Chart (Summarized Data)

Minitab^® – Bar Chart (Raw Data)

Minitab^® – Bar Chart (Summarized Data)