Lesson 4: Conditional Probability

Lesson 4: Conditional Probability

Overview

In this lesson, we'll focus on finding a particular kind of probability called a conditional probability. In short, a conditional probability is a probability of an event given that another event has occurred. For example, rather than being interested in knowing the probability that a randomly selected male has prostate cancer, we might instead be interested in knowing the probability that a randomly selected male has prostate cancer given that the male has an elevated prostate-specific antigen. We'll explore several such conditional probabilities.

Objectives

Upon completion of this lesson, you should be able to:

• Understand the definition of conditional probability.
• Learn how to use the relative frequency approach to assigning probability to find the conditional probability of an event from a two-way table.
• Learn how to use the formula for conditional probability.
• Learn how to use the multiplication rule to find the probability of the intersection of two events.
• Learn how to use the multiplication rule to find the probability of the intersection of more than two events.
• Learn to apply the techniques learned in the lesson to new problems.

4.1 - The Motivation

4.1 - The Motivation

Example 4-1

A researcher is interested in evaluating how well a diagnostic test works for detecting renal disease in patients with high blood pressure. She performs the diagnostic test on 137 patients — 67 with known renal disease and 70 who are known to be healthy. The diagnostic test comes back either positive (the patient has renal disease) or negative (the patient does not have renal disease). Here are the results of her experiment:

 Test Results Truth Positive Negative Total Renal Disease 44 23 67 Healthy 10 60 70 Total 54 83 137

If we let $$T+$$ be the event that the person tests positive, we can use the relative frequency approach to assigning probability to determine that:

$$P(T+)=\dfrac{54}{137}$$

because, of the 137 patients, 54 tested positive. If we let $$D$$ be the event that the person is truly diseased, we determine that:

$$P(D)=\dfrac{67}{137}$$

because, of the 137 patients, 67 are truly diseased. That's all well and good, but the question that the researcher is really interested in is this:

If a person has renal disease, what is the probability that he/she tests positive for the disease?

The blue portion of the question is a "conditional", while the green portion is a "probability." Aha... do you get it? These are the kinds of questions that we are going to be interested in answering in this lecture, and hence its title "Conditional Probability." Now, let's just push this example a little bit further, and in so doing introduce the notation we are going to use to denote a conditional probability.

We can again use the relative frequency approach and the data the researcher collected to determine:

$$P(T+|D)=\dfrac{44}{67}=0.65$$

That is, the probability a person tests positive given he/she has renal disease is 0.65. There are a couple of things to note here.

First, the notation $$P(T+|D)$$ is standard conditional probability notation. It is read as "the probability a person tests positive given he/she has renal disease." The bar ( | ) is always read as "given." The probability we are looking for precedes the bar, and the conditional follows the bar.

Second, note that determining the conditional probability involves a two-step process. In the first step, we restrict the sample space to only those (67) who are diseased. Then, in the second step, we determine the number of interest (44) based on the new sample space.

Hmmm.... rather than having to do all of this thinking (!), can't we just derive some sort of general formula for finding a conditional probability?

In the next section, we generalize our derived formula.

4.2 - What is Conditional Probability?

4.2 - What is Conditional Probability?
Conditional Probability

The conditional probability of an event $$A$$ given that an event $$B$$ has occurred is written:

$$P(A|B)$$

and is calculated using:

$$P(A|B)=\dfrac{P(A\cap B)}{P(B)}$$

as long as $$P(B)>0$$.

Example 4-1 Continued

Let's return to our diagnostic test for renal disease. Recall that the researcher collected the following data:

Test Results
Truth Positive Negative Total
Renal Disease 44 23 67
Healthy 10 60 70
Total 54 83 137

Now, when a researcher is developing a diagnostic test, the question she cares about is the one we investigated previously, namely:

If a person has renal disease, what is the probability of testing positive?

This quantity is what we would call the "sensitivity" of a diagnostic test. As patients, we are interested in knowing what is called the "positive predictive value" of a diagnostic test. That is, we are interested in this question:

If I receive a positive test, what is the probability that I actually have the disease?

We would hope, of course, that the probability is 1. But, only rarely is a diagnostic test perfect. The collected data suggest that the renal disease test is not perfect. How good is it? That is, what is the positive predictive value of the test?

Properties of Conditional Probability

Because conditional probability is just a probability, it satisfies the three axioms of probability. That is, as long as $$P(B)>0$$:

1. $$P(A|B)\ge0$$
2. $$P(B|B)=1$$
3. If $$A_1, A_2, \ldots, A_k$$ are mutually exclusive events, then $$P(A_1\cup A_2\cup \ldots \cup A_k|B)=P(A_1|B)+P(A_2|B)+\ldots+P(A_k|B)$$ and likewise for infinite unions.

The "proofs" of the first two axioms are straightforward:

The "proof" of the third axiom is also straightforward. It just takes a little more work:

Example 4-3

A box contains 6 white balls and 4 red balls. We randomly (and without replacement) draw two balls from the box. What is the probability that the second ball selected is red, given that the first ball selected is white?

What is the probability that both balls selected are red?

The second method used in solving this problem used what is known as the multiplication rule. Now that we've actually used the rule, let's now go and generalize it!

4.3 - Multiplication Rule

4.3 - Multiplication Rule
Multiplication Rule

The probability that two events A and B both occur is given by:

$$P(A\cap B)=P(A|B)P(B)$$

or by:

$$P(A\cap B)=P(B|A)P(A)$$

Example 4-4

A box contains 6 white balls and 4 red balls. We randomly (and without replacement) draw two balls from the box. What is the probability that the second ball selected is red?

We'll see calculations like the one just made over and over again when we study Bayes' Rule.

The Multiplication Rule Extended

The multiplication rule can be extended to three or more events. In the case of three events, the rule looks like this:

$$P(A \cap B \cap C)=P[(A \cap B) \cap C)]=\underbrace{P(C | A \cap B)}_{a} \times \underbrace{P(A \cap B)}_{b}$$

$$\text { But since } P(A \cap B)=\underbrace{P(B | A) \times P(A)}_{b}\colon$$

$$P(A \cap B \cap C)=\underbrace{P(C | A \cap B)}_{a} \times \underbrace{P(B | A) \times P(A)}_{b}$$

Example 4-5

Three cards are dealt successively at random and without replacement from a standard deck of 52 playing cards. What is the probability of receiving, in order, a king, a queen, and a jack?

4.4 - More Examples

4.4 - More Examples

Example 4-6

A drawer contains:

• 4 red socks
• 6 brown socks
• 8 green socks

A man is getting dressed one morning and barely awake when he randomly selects 2 socks from the drawer (without replacement, of course). What is the probability that both of the socks he selects are green given that they are the same color? If we define four events as such:

• Let $$R_i$$ = the event the man selects a red sock on selection $$i$$ for $$i = 1, 2$$
• Let $$B_i$$ = the event the man selects a brown sock on selection $$i$$ for $$i = 1, 2$$
• Let $$G_i$$ = the event the man selects a green sock on selection $$i$$ for $$i = 1, 2$$
• Let $$S$$ = the event that the 2 socks selected are the same color

then we are looking for the following conditional probability:

$$P(G_1\text{ and }G_2|S)$$

Let's give it a go.

Example 4-7

Medical records reveal that of the 937 men who died in a particular region in 1999:

• 212 of the men died of causes related to heart disease,
• 312 of the men had at least one parent with heart disease

Of the 312 men with at least one parent with heart disease, 102 died of causes related to heart disease. Using this information, if we randomly select a man from the region, what is the probability that he dies of causes related to heart disease given that neither of his parents died from heart disease? If we define two events as such:

• Let $$H$$ = the event that at least one of the parents of a randomly selected man died of causes related to heart disease
• Let $$D$$= the event that a randomly selected man died of causes related to heart disease

then we are looking for the following conditional probability:

$$P(D|H^\prime)$$

The following viewlet uses a Venn diagram to help us work through this problem. Just click on the Inspect! icon when you're good and ready (you'll no doubt want to use the pause and play buttons freely):

If a Venn diagram doesn't do it for you, perhaps an alternative way will:

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility