4.1 - The Motivation

Example 4-1 Section

test tubes

A researcher is interested in evaluating how well a diagnostic test works for detecting renal disease in patients with high blood pressure. She performs the diagnostic test on 137 patients — 67 with known renal disease and 70 who are known to be healthy. The diagnostic test comes back either positive (the patient has renal disease) or negative (the patient does not have renal disease). Here are the results of her experiment:

  Test Results  
Truth Positive Negative Total
Renal Disease 44 23 67
Healthy 10 60 70
Total 54 83 137

If we let \(T+\) be the event that the person tests positive, we can use the relative frequency approach to assigning probability to determine that:

\(P(T+)=\dfrac{54}{137}\)

because, of the 137 patients, 54 tested positive. If we let \(D\) be the event that the person is truly diseased, we determine that:

\(P(D)=\dfrac{67}{137}\)

because, of the 137 patients, 67 are truly diseased. That's all well and good, but the question that the researcher is really interested in is this:

If a person has renal disease, what is the probability that he/she tests positive for the disease?

The blue portion of the question is a "conditional", while the green portion is a "probability." Aha... do you get it? These are the kinds of questions that we are going to be interested in answering in this lecture, and hence its title "Conditional Probability." Now, let's just push this example a little bit further, and in so doing introduce the notation we are going to use to denote a conditional probability.

We can again use the relative frequency approach and the data the researcher collected to determine:

\(P(T+|D)=\dfrac{44}{67}=0.65\)

That is, the probability a person tests positive given he/she has renal disease is 0.65. There are a couple of things to note here.

First, the notation \(P(T+|D)\) is standard conditional probability notation. It is read as "the probability a person tests positive given he/she has renal disease." The bar ( | ) is always read as "given." The probability we are looking for precedes the bar, and the conditional follows the bar.

Second, note that determining the conditional probability involves a two-step process. In the first step, we restrict the sample space to only those (67) who are diseased. Then, in the second step, we determine the number of interest (44) based on the new sample space.

Hmmm.... rather than having to do all of this thinking (!), can't we just derive some sort of general formula for finding a conditional probability?

In the next section, we generalize our derived formula.