2.1 - Categorical Variables

2.1 - Categorical Variables

Categorical variables are discussed in Sections 2.1 and P.1 of the Lock5 textbook.

Variables can be classified as categorical or quantitative. In this section of the lesson, we will be focusing on categorical variables. Categorical variables are those that provide groupings that may have no logical order, or a logical order with inconsistent difference between groups (e.g., the difference between 1 and 2 is not equivalent to the difference between 3 and 4).

This course includes many examples and practice problems for you. Many of these will apply the concepts that we learn to experiments involving rolling a die or randomly selecting a card from a standard 52-card deck. If you are unfamiliar with either of these, take a moment here to review.

Die

A standard die has 6 sides: 1, 2, 3, 4, 5, 6

Six-sided die

52-Card Deck

A standard 52-card deck of playing cards has 13 Hearts, 13 Diamonds, 13 Spades, and 13 Clubs. Hearts (♥) and Diamonds (♦) are red suits. Spades (♠) and Clubs (♣) are black suits. For each suit, there is a 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King, and Ace. Jacks, Queens, and Kings are "face cards."

Deck of Cards


2.1.1 - One Categorical Variable

2.1.1 - One Categorical Variable

Data concerning one categorical variable can be summarized using a proportion.

Proportion
\(Proportion=\dfrac{Number\;in\;the\;category}{Total\;number}\)

The symbol for a sample proportion is \(\widehat{p}\) and is read as "p-hat." The symbol for a population proportion is \(p\). 

The formula for a sample proportion may also be written as \(\widehat p = \frac{x}{n}\) where \(x\) is the number in the sample with the trait of interest and \(n\) is the sample size.

A proportion must be between 0 and 1.00.

Example: Black Cards

A standard 52-card deck contains \(26\) red cards and \(26\) black cards. What proportion of cards are black?

\(p=\dfrac{26}{52}=0.50\)

The symbol \(p\) was used because this is the proportion of all cards (i.e., the population) that are black.

Example: World Campus Undergraduate Students

In the Fall 2014 semester, there were \(82,382\) undergraduate students enrolled in Penn State. Of those, \(6,245\) were World Campus students. What proportion of all Penn State undergraduate students were World Campus students?

\(p=\dfrac{6245}{82382}=0.076\)

The symbol \(p\) was used because this is the proportion of all Penn State undergraduate students (i.e., the population) that are World Campus students.

Example: Broken Cookies

In a sample of \(30\) randomly selected packages of chocolate chip cookies, \(18\) contained broken cookies. What proportion of these selected packages had broken cookies?

\(\widehat{p}=\dfrac{18}{30}=0.60\)

These data were collected from a sample so the symbol \(\widehat{p}\) was used to denote a sample proportion. 


2.1.1.1 - Risk and Odds

2.1.1.1 - Risk and Odds

You may have heard the terms risk and odds before. They are both ways to communicate the likelihood of an event.

Risk and odds are often confused with one another. The formulas for computing risk and odds are different and their interpretations are different.

Risk

In statistics, the word risk communicates the likelihood of an event occurring. This is synonymous with probability or proportion (i.e., the formulas are the same).

Risk
The probability that an event will occur. It may be written as a decimal, a fraction, or a percent.
Risk
\(Risk= \dfrac{number \;with \;the\; outcome}{total\;number\;of\;outcomes}\)

Example: Asthma Risk

\(60\) out of \(1000\) teens have asthma.

\(risk=\dfrac{60}{1000}=0.06\)

This means that \(6\%\) of teens experience asthma.

Example: Flu Risk

\(45\) out of \(100\) children get the flu each year.

\(risk=\dfrac{45}{100}=0.45\) or \(45\%\)

Odds

Odds
Express risk by comparing the likelihood of an event happening to the likelihood it does not happen.
Odds

\(odds = \dfrac {number \;with \;the\; outcome}{number \;without \;the \;outcome}\)

OR

\(odds=\dfrac{risk}{1-risk}\)

We often interpret odds in relation to the value of 1. For example, if the odds of a game are in favor of the house 2 to 1, that means for every 2 games the house wins it will lose 1. 

Example: Passing Odds

In one large class, 850 students passed an exam while 150 students failed. Because we have the raw counts, we can use the first odds formula.

\(odds=\dfrac {number \;with \;the\; outcome}{number \;without \;the \;outcome}=\dfrac{850}{150}=5.667\)

The odds of passing were 5.667 to 1. In other words, for every 5.667 students who passed the exam there was 1 who failed.

Example: Flu Odds

The risk of a child getting the flu is \(45\%\) which can also be written as \(0.45\). Because we have the risk, we can use the second odds formula.

\(odds=\dfrac{risk}{1-risk}=\dfrac{0.45}{1-0.45}=\dfrac{0.45}{0.55}=0.818\)

The odds of a child getting the flu is \(0.818\) to \(1\).


2.1.1.2 - Visual Representations

2.1.1.2 - Visual Representations

Frequency tables, pie charts, and bar charts are the most appropriate graphical displays for categorical variables. Below are a frequency table, a pie chart, and a bar graph for data concerning Penn State’s undergraduate enrollments by campus in Fall 2017.

Note that in the bar chart, the bars are separated by a space. The spaces between the bars signify that this is a categorical variable. On the following pages you will learn how to make these graphs using Minitab Express.

Frequency Table

A table containing the counts of how often each category occurs.

Tally
Campus Count Percent
University Park 40835 48.5%
Commonwealth Campuses 29388 34.9%
PA College of Technology 5465 6.5%
World Campus 8513 10.1%
Total 84201 100.0%

Penn State Fall 2017 Undergraduate Enrollments

Pie chart

Graphical representation for categorical data in which a circle is partitioned into “slices” on the basis of the proportions of each category.

Pie Chart of Campus
Category
  •  University Park (48.5%)
  •  Commonwealth Campuses (34.9%)
  •  PA College of Technology (6.5%)
  •  World Campus (10.1%)
Penn State Fall 2017 Undergraduate Enrollments
Bar chart

Graphical representation for categorical data in which vertical (or sometimes horizontal) bars are used to depict the number of experimental units in each category; bars are separated by space.

Minitab Express Bar Chart for Fall 2017 Penn State Undergraduate Enrollments

Penn State Fall 2017 Undergraduate Enrollments

Tips

Pie charts tend to work best when there are only a few categories. If a variable has many categories, a pie chart may be more difficult to read. In those cases, a frequency table or bar chart may be more appropriate.

When selecting a visual display for your data you should first determine how many variables you are going to display and whether they are categorical or quantitative. Then, you should think about what you are trying to communicate. Each visual display has its own strengths and weaknesses. When first starting out, you may need to make a few different types of displays to determine which best communicates your data.


2.1.1.2.1 - Minitab Express: Frequency Tables

2.1.1.2.1 - Minitab Express: Frequency Tables

The following data set (from College Board) contain the mean SAT scores for each of the 50 states and Washington, DC, as well the participation rates and geographic region of each state. To get an idea of the pattern of variation of a categorical variable such as region, we can display the information with a frequency table, pie chart, or bar graph.

MinitabExpress  – Frequency Table

To create a frequency table in Minitab Express:

  1. Open the data set:
  2. On a PC: In the menu bar select STATISTICS > Describe > Tally
    On a Mac: In the menu bar select Statistics > Summary Statistics > Tally
  3. Double click the variable Region in the box on the left to insert the variable into the Variable box
  4. Under Statistics, check Counts and Percents
  5. Click OK

This should result in the following frequency table:

Tally
Region Count Percent
ENC 5 9.8039%
ESC 4 7.8431%
MA 3 5.8824%
MTN 8 15.6863%
NE 6 11.7647%
PAC 5 9.8039%
SA 9 17.6471%
WNC 7 13.7255%
WSC 4 7.8431%
N= 51  
Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.


2.1.1.2.2 - Minitab Express: Pie Charts

2.1.1.2.2 - Minitab Express: Pie Charts

The following data set (from College Board) contain the mean SAT scores for each of the 50 states and Washington, DC, as well the participation rates and geographic region of each state. 

MinitabExpress  – Pie Chart (Raw Data)

To create a pie chart in Minitab Express:

  1. Open the data set:
  2. On a PC or Mac: Select Graphs > Pie Chart
  3. Select Counts of Unique Values
  4. Double click the variable Region in the box on the left to insert the variable into the Categorical variable box
  5. Click OK

This should result in the pie chart below:

Pie Charts created using Minitab Express
Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.

Summarized Data

In the examples above raw data were used. In other words, the dataset contained one row for each case. It is also possible to use Minitab Express to construct a pie chart given summarized data, for example, if you had your counts in a frequency table. If this were the case, in step 3 you would select Summarized Data and enter the names of the categories in the Category names box and the frequency counts in the Summary values box.


2.1.1.2.3 - Minitab Express: Bar Charts

2.1.1.2.3 - Minitab Express: Bar Charts

The following data set (from College Board) contain the mean SAT scores for each of the 50 states and Washington, DC, as well the participation rates and geographic region of each state. 

MinitabExpress  – Bar Chart (Raw Data)

To create a bar graph in Minitab Express:

  1. Open the data set 
  2. On a PC or Mac: Select Graphs > Bar Chart
  3. Use the default from the drop down Bars represent of Counts of unique values in a categorical variable
  4. Select Simple
  5. Double click the variable Region in the box on the left to insert the variable into the Categorical variable box
  6. Click OK

This should result in the bar graph below:

Chart of Region

Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.

Summarized Data

In the examples above raw data were used. In other words, the Minitab Express file consisted of one row for each case. We can also use Minitab Express to construct a bar chart with summarized data, for example, if you had data in a frequency table. To do this, in the third step shown above you will change the dropdown of Bars represent to Summarized values for each category in a table. You will still select Simple. The Summary variable will be the numerical values and the Categorical variable will be the names of the categories. 


2.1.2 - Two Categorical Variables

2.1.2 - Two Categorical Variables

Data concerning two categorical variables may be communicated using a two-way table, also known as a contingency table. Data concerning two categorical variables can visualized using a segmented bar chart or a clustered bar chart. A clustered bar chart is also known as a side-by-side bar chart. 

Two-Way Table
A display of counts for two categorical variables in which the rows represent one variable and the columns represent a second variable. Also known as a contingency table.

Example: World Campus Enrollments by Sex

We will use the two-way table below of Penn State World Campus enrollments by biological sex and level to walk through a few examples of how to read a two-way table. These data are from the Penn State Factbook and from the Fall of 2015.

  Female Male Total
Undergraduate 3814 3428 7242
Graduate 2213 2787 5000
Total 6027 6215 12242

 

Proportion Undergraduate

What proportion of the population of World Campus students is undergraduate?

\(p=\frac{7242}{12242}=0.596\)


Proportion of Females who are Undergraduates

What proportion of females in this population are undergraduate students?

\(p=\frac{3814}{6027}=0.633\)

Later in this lesson, you will learn that this is known as a conditional probability.


Proportion of Undergraduates who are Female

What proportion of undergraduate students in this population are female?

\(p=\frac{3814}{7242}=0.527\)

Segmented Bar Chart

Also known as a stacked bar chart, one categorical variable is represented on the x-axis while the second categorical variable is denoted within the bars. Minitab Express will not construct a stacked bar chart, but other softwares will. The segmented bar chart below was constructed using Excel.

World Campus Enrollments, Fall 2015
Clustered Bar Chart

Each bar represents one combination of the two categorical variables (i.e., one cell in a contingency table). This is also known as a side-by-side bar chart. 

Side-by-side Chart of Undergraduate and Graduate Females and Males


2.1.2.1 - Minitab Express: Two-Way Table

2.1.2.1 - Minitab Express: Two-Way Table

This dataset consists of STAT 200 students' responses to survey. We can construct a two-way table showing the relationship between Smoke Cigarettes (row variable) and Biological Sex (column variable) using Minitab Express.

MinitabExpress  – Two-Way Table

To create a two-way table in Minitab Express:

  1. Open the data set: 
  2. On a PC: Select STATISTICS > Cross Tabulation and Chi-square
    On a Mac: Select Statistics > Tables > Cross Tabulation and Chi-Square
  3. Select Raw data (categorical variable) from the drop down menu
  4. Double click the variable Smoke Cigarettes in the box on the left to insert the variable into the Rows box
  5. Double click the variable Biological Sex in the box on the left to insert the variable into the Columns box
  6. Click OK

This should result in the two-way table below:

Tabulated Statistics: Smoke Cigarettes, Biological Sex
Rows: Smokes Cigarettes | Columns: Biological Sex
  Female Male All
No 120 89 209
Yes 7 10 17
All 127 99 226
Cell Contents: Count
Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.


2.1.2.1.1 - Video Example: Reading a Two-Way Table

2.1.2.1.1 - Video Example: Reading a Two-Way Table

The video example below uses the dataset:


2.1.2.2 - Minitab Express: Clustered Bar Chart

2.1.2.2 - Minitab Express: Clustered Bar Chart

We are going to use the the Class Survey data set in this example again:

MinitabExpress  – Clustered Bar Chart

To create a clustered bar chart in Minitab Express:

  1. Open the data set:
  2. On a PC or Mac: Select Graphs > Bar Chart 
  3. In this example we have a datafile with the responses from each case so for Bars represent select Counts of unique values in a categorical variable
  4. Select Clustered
  5. Double click the variables Biological sex and Smoke Cigarettes in the box on the left to insert the variable into the Categorical variables box
  6. Click OK

This should result in the clustered bar chart below:

Minitab Express Clustered Bar Chart of Biological Sex and Smoking

Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.

Note: The order in which the variables are entered into the Categorical variables box in Minitab Express determines how the bars will be clustered. For example, if we entered Smoke Cigarettes and then Gender, the result would be the following clustered bar chart:

Minitab Express Clustered Bar Chart of Smoking and Biological Sex


2.1.3 - Probability Rules

2.1.3 - Probability Rules

The probability rules covered in this lesson can be found in section P.1 of the Lock5 textbook.

Earlier in this lesson you were introduced to proportions. We used the notation: \(Proportion=\frac{Number\;in\;the\;category}{Total\;number}\).

When we discuss probabilities, we will use the notation below where \(P(A)\) is the probability of event \(A\) occurring. Probabilities are typically written in decimal form but may also be translated to percentages. 

Note that this is the same formula that you learned earlier in Lesson 2.1.1 for a proportion.

Probability of Event A
\(P(A)=\dfrac{Number\;in\;group\;A}{Total\;number}\)

Example: Spades

What is the probability that a randomly selected card from a standard 52-card deck will be a spade? There are 13 spades in the deck of 52.

\(P(spade)=\dfrac{13}{52}=0.25\)

The probability of pulling a spade is 0.25. We could also say that there is a 25% chance of pulling a spade.

Example: Odd Numbers

If you roll a six-sided die, what is the probability of getting an odd number? There are three odd numbers on the die (1, 3, 5).

\(P(odd)=\dfrac{3}{6}=0.50\)

The probability of rolling an odd number is 0.50. We could also say that there 50% chance of rolling an odd number.

Example: Raffle

There are a total of 500 raffle tickets and you have purchased 10. What is the probability that one of your tickets will be randomly selected to win the raffle?

\(P(winning)=\dfrac{10}{500}=0.02\)

The probability of you winning is 0.02. We could also say that there is a 2% chance that you will win.


2.1.3.1 - Range of Probabilities

2.1.3.1 - Range of Probabilities

The probability of an impossible event is 0 and the probability of a certain event is 1. The range of possible probabilities is: \(0 \leq P(A) \leq 1\). It is not possible to have a probability less than 0 or greater than 1. 

Example: Rolling an 8

It is impossible to roll an eight on a six-sided die.

\(P(rolling\; 8)= \dfrac{0}{6} = 0\)

Example: Blue Cards

In a standard 52-card deck all cards are black or red. There are no blue cards.

\(P(blue)=\dfrac{0}{52}=0\)

Example: Rolling a Value Between 1 and 6

A six-sided die contains the values 1, 2, 3, 4, 5, and 6. All rolls will result in a value between 1 and 6.

\(P(rolling \;1 \;to\; 6)=\dfrac{6}{6}=1.00\)


2.1.3.2 - Combinations of Events

2.1.3.2 - Combinations of Events

In situations with two or more categorical variables there are a number of different ways that combinations of events can be described: intersections, unions, complements, and conditional probabilities. Each of these combinations of events is covered in your textbook. However, note that your textbook does not use the symbols that are most commonly used when discussing these combinations of events. The symbols that we will be using are in the table below. In this section, you will also learn about disjoint events and independent events

Combination of Events
Combination Symbol Definition
Intersection \(P(A\cap B)\) Probability of A and B
Union \(P(A\cup B)\)

Probability of A or B

Note: This includes the possibility of A and B

Complement \(P(A^C)\) The probability of NOT A
Conditional \(P(A\mid B)\) The probability of A given B

2.1.3.2.1 - Disjoint & Independent Events

2.1.3.2.1 - Disjoint & Independent Events

Note that disjoint events and independent events are different. Events are considered disjoint if they never occur at the same time; these are also known as mutually exclusive events. Events are considered independent if they are unrelated.

Disjoint Events

Two events that do not occur at the same time. These are also known as mutually exclusive events

In the Venn diagram below event A and event B are disjoint events because the two do not overlap.

Mutually Exclusive
Venn diagram
A visual representation in which the sample space is depicted as a box and events are represented as circles within the sample space.
Independent Events
Unrelated events. The outcome of one event does not impact the outcome of the other event.

Example: Freshmen & Sophomores

Let's consider undergraduate class status. A student can be classified as a freshman, sophomore, junior, or senior.

Being a freshman and being a sophomore are disjoint events because an individual cannot be classified as both at the same time. 

Being a freshman is not independent of being a sophomore. If I know that an individual is a freshman then the probability that they are a sophomore is 0; knowing that the student was a freshman provided information that influenced my prediction of them being a sophomore. 

Example: Class Status & Gender

Assume that there is no relationship between gender and class status. This means that within each class (freshmen, sophomores, juniors, seniors) the proportion of students who are men is consistent. It also means that within each gender the proportion of students who are freshmen, sophomores, juniors, and seniors is consistent.

In this case, we could say that the events of being a man and being a senior are independent events. Knowing that a student is a man does not influence the likelihood of him being a senior. Knowing that a student is a senior does not change the likelihood of them being a man.

There are some men who are seniors, so these events are not disjoint. 


2.1.3.2.2 - Intersections

2.1.3.2.2 - Intersections
Intersection

The overlap of two or more events is symbolized by the character \(\cap\). 

\(P(A \cap B)\) is read as "the probability of A and B."

Intersection of A and B

Example: Red King

What is the probability of randomly selecting a card from a standard 52-card deck that is a red card and a king?

There are 2 kings that are red cards: the king of hearts and the king of diamonds.

\(P(red \cap king)=\dfrac{2}{52}=.0385\)

Example: Female Undergraduate Students

The two-way table below displays the World Campus enrollment from Fall 2015 in terms of level (undergraduate and graduate) and biological sex. What proportion of World Campus students were female and undergraduate students?

  Female Male Total
Undergraduate 3814 3428 7242
Graduate 2213 2787 5000
Total 6027 6215 12242

There are 3814 students who are females and undergraduates out of a total of 12242 students.

\(P(F \cap U)=\dfrac{3814}{12242}=0.312\)


2.1.3.2.3 - Unions

2.1.3.2.3 - Unions
Union

A union contains the area in A or B and is symbolized by \(\cup\). Note that this also includes the overlap of A and B (i.e., the intersection).

\(P(A \cup B)\) is read as "the probability of A or B."

Union of A and B
Union
\(P(A\cup B) = P(A)+P(B)-P(A\cap B)\)

Example: Hearts or Spades

What is the probability of randomly selecting a card from a standard 52-card deck that is a heart or spade?

There are 13 cards that are hearts, 13 cards that are spades, and no cards that are both a heart and a spade.

\(P(heart \cup spade)=\dfrac{13}{52}+\dfrac{13}{52}-\dfrac{0}{52}= \dfrac {26}{52}=0.5\)

Example: Hearts or Aces

What is the probability of randomly selecting a card from a standard 52-card deck that is a heart or an ace?

There are 13 cards that are hearts and 4 cards that are aces. There is one ace of hearts, so one of those 4 aces has already been counted.

\(P(heart \cup ace)=\dfrac{13}{52}+\dfrac{4}{52}-\dfrac{1}{52}=\dfrac{16}{52}=0.308\)

Example: Female or Undergraduate

The two-way table below displays the World Campus enrollment from Fall 2015 in terms of level (undergraduate and graduate) and biological sex. What proportion of World Campus students were female or undergraduate students?

  Female Male Total
Undergraduate 3814 3428 7242
Graduate 2213 2787 5000
Total 6027 6215 12242

When we have a contingency table we can take the appropriate values from the table as opposed to using the formula given above. There are 3814 female undergraduate students, 3428 male undergraduate students, 2213 female graduate students, and a total of 12242 students.

\(P(F \cup U)=\dfrac{3814+3428+2213}{12242}=\dfrac{9455}{12242}=0.772\)

Note that the final answer would be the same if we had used the formula:

\(P(F \cup U) = \dfrac{6027}{12242}+\dfrac{7242}{12242}-\dfrac{3814}{12242}= \dfrac{9455}{12242}=0.772\)


2.1.3.2.4 - Complements

2.1.3.2.4 - Complements
Complement

The probability that the event does not occur. The complement of \(P(A)\) is \(P(A^C)\). This may also be written as \(P(A')\).

In the diagram below we can see that \(A^{C}\) is everything in the sample space that is not A.

Complement of A
Complement of A
\(P(A^{C})=1−P(A)\)

Example: Coin Flip

When flipping a coin, one can flip heads or tails. Thus, \(P(Tails^{C})=P(Heads)\) and \(P(Heads^{C})=P(Tails)\)

Example: Hearts

If you randomly select a card from a standard 52-card deck, you could pull a heart, diamond, spade, or club. The complement of pulling a heart is the probability of pulling a diamond, spade, or club. In other words: \(P(Heart^{C})=P(Diamond,\; Spade,\;\;Club)\)

The complement of any outcome is equal to one minus the outcome. In other words: \(P(A^{C})=1-P(A)\)

It is also true then that: \(P(A)=1-P(A^{C})\)

Example: Rain

Light Rain Showers

According to the weather report, there is a 30% chance of rain today: \(P(Rain) = .30\) 

Raining and not raining are complements.

\(P(Not \:rain)=P(Rain^{C})=1-P(Rain)=1-.30=.70\)

There is a 70% chance that it will not rain today.

Example: Winning

The probability that your team will win their next game is calculated to be .45, in other words:

\(P(Winning)=.45\)

Winning and losing are complements of one another. Therefore the probability that they will lose is:

\(P(Losing)=P(Winning^{C})=1-.45=.55\)

The sum of all of the probabilities for possible events is equal to 1.

Example: Cards

In a standard 52-card deck there are 26 black cards and 26 red cards. All cards are either black or red.

\(P(red)+P(black)=\frac{26}{52}+\frac{26}{52}=1\)

Example: Dominant Hand

Of individuals with two hands, it is possible to be right-handed, left-handed, or ambidextrous. Assuming that these are the only three possibilities and that there is no overlap between any of these possibilities:

\(P(right\;handed)+P(left\;handed)+P(ambidextrous) = 1\)


2.1.3.2.5 - Conditional Probability

2.1.3.2.5 - Conditional Probability
Conditional Probability

The probability of one event occurring given that it is known that a second event has occurred. This is communicated using the symbol \(\mid\) which is read as "given."

For example, \(P(A\mid B)\) is read as "Probability of A given B."

Example: PA Resident given Undergraduate

The two-way table below displays the World Campus enrollment from Fall 2019 in terms of level (undergraduate and graduate) and residency (Pennsylvania and non-Pennsylvania). Given that an individual is an undergraduate student, what is the probability that the student is a Pennsylvania resident?

  Pennsylvania Non-Pennsylvania Total
Undergraduate 3757 4603 8360
Graduate 2253 4074 6327
Total 6010 8677 14687

We know that the individual is an undergraduate student so we will only look at the 8360 undergraduate students. Of those 8360 undergraduate students, 3757 were Pennsylvania residents.

\(P(PA \mid Undergrad) = \dfrac{3757}{8360}=0.449\)


2.1.3.2.5.1 - Advanced Conditional Probability Applications

2.1.3.2.5.1 - Advanced Conditional Probability Applications

Advanced Formulas

Conditional probabilities can also be computed using the following formulas. Note that these two formulas are identical, but A and B are switched. Again, if the contingency table is available it is usually most efficient to take the appropriate values from the table, as shown above, as opposed to using these formulas.

Conditional Probability of A Given B
\(P(A\mid B)=\dfrac{P(A \: \cap\: B)}{P(B)}\)
Conditional Probability of B Given A
\(P(B\mid A)=\dfrac{P(A \: \cap\: B)}{P(A)}\)

Example: Clubs

In a standard 52-card deck, there are 26 black cards including 13 clubs. All clubs are black, therefore there are 13 black clubs.

What is the probability that a randomly selected card is a club given that it is a black card?

We are given that \(P(club)=\frac{13}{52}=0.25\), \(P(black)=\frac{26}{52}=0.50\), and  \(P(club \: \cap\: black)=\frac{13}{52}0.25\)

\(P(club\mid black)=\dfrac{P(club \: \cap\: black)}{P(black)}=\dfrac{0.25}{0.50}=0.50\)

Given that a randomly selected card is black, there is a 50% chance that it's a club.

Independent Events Written as Conditional Probabilities

If events A and B are independent then \(P(A) = P(A \mid B)\). In other words, whether or not event B occurs does not change the probability of event A occurring.

Example: Checking for Independence, Aces and Hearts

A card is randomly drawn from a 52-card deck. Are the events of drawing an ace and drawing a heart independent?

In a standard 52-card deck, there are 4 aces and 13 hearts. Therefore \(P(ace)=\frac{4}{52}\) and \(P(heart)=\frac{13}{52}\). Out of 13 hearts, 1 is an ace, which translates to \(P(ace \mid heart) = \frac{1}{13}\).

To determine if these two events are independent we can compare \(P(A)\) to \(P(A\mid B)\). If we call being an ace event A and being a heart event B, then we're comparing \(P(ace)\) to \(P(ace \mid heart)\).

\(P(ace)=\frac{4}{52}=0.0769\)

\(P(ace \mid heart) = \frac{1}{13}=0.0769\)

These values are identical, therefore we can conclude that the events of drawing an ace and drawing a heart are independent. 

 


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility