21.1 - Conditional Distribution of Y Given X

Let's start with the assumptions that we stated previously in the introduction to this lesson. That is, let's assume that:

The continuous random variable \(Y\) follows a normal distribution for each \(x\).
The conditional mean of \(Y\) given \(x\), that is, \(E(Y|x)\), is linear in \(x\). Recall that that means, based on our work in the previous lesson, that:

\(E(Y|x)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\)
The conditional variance of \(Y\) given \(x\), that is, \(\text{Var}(Y|x)=\sigma^2_{Y|X}\) is constant, that is, the same for each \(x\).

There's a pretty good three-dimensional graph in our textbook depicting these assumptions. A two-dimensional graph with our height and weight example might look something like this:

The blue line represents the linear relationship between x and the conditional mean of \(Y\) given \(x\). For a given height \(x\), say \(x_1\), the red dots are meant to represent possible weights y for that \(x\) value. Note that the range of red dots is intentionally the same for each \(x\) value. That's because we are assuming that the conditional variance \(\sigma^2_{Y|X}\) is the same for each \(x\). If we were to turn this two-dimensional drawing into a three-dimensional drawing, we'd want to draw identical looking normal curves over the top of each set of red dots.

So, in summary, our assumptions tell us so far that the conditional distribution of \(Y\) given \(X=x\) is:

\(Y|x \sim N \left(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X),\qquad ??\right)\)

If we could just fill in those question marks, that is, find \(\sigma^2_{Y|X}\), the conditional variance of \(Y\) given \(x\), then we could use what we already know about the normal distribution to find conditional probabilities, such as \(P(140<Y<160|X=x)\). The following theorem does the trick for us.

Theorem

If the conditional distribution of \(Y\) given \(X=x\) follows a normal distribution with mean \(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\) and constant variance \(\sigma^2_{Y|X}\), then the conditional variance is:

\(\sigma^2_{Y|X}=\sigma^2_Y(1-\rho^2)\)

Proof

Because \(Y\) is a continuous random variable, we need to use the definition of the conditional variance of \(Y\) given \(X=x\) for continuous random variables. That is:

\(\sigma^2_{Y|X}=Var(Y|x)=\int_{-\infty}^\infty (y-\mu_{Y|X})^2 h(y|x) dy\)

Now, if we replace the \(\mu_{Y|X}\) in the integrand with what we know it to be, that is, \(E(Y|x)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\), we get:

\(\sigma^2_{Y|X}=\int_{-\infty}^\infty \left[y-\mu_Y-\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\right]^2 h(y|x) dy\)

Then, multiplying both sides of the equation by \(f_X(x)\) and integrating over range of \(x\), we get:

\(\int_{-\infty}^\infty \sigma^2_{Y|X} f_X(x)dx=\int_{-\infty}^\infty \int_{-\infty}^\infty \left[y-\mu_Y-\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\right]^2 h(y|x) f_X(x)dydx\)

Now, on the left side of the equation, since \(\sigma^2_{Y|X}\) is a constant that doesn't depend on \(x\), we we can pull it through the integral. And, you might recognize that the right side of the equation is an (unconditional) expectation, because:

After pulling the conditional variance through the integral on the left side of the equation, and rewriting the right side of the equation as an expectation, we have:

\(\sigma^2_{Y|X}\int_{-\infty}^\infty f_X(x)dx=E\left\{\left[(Y-\mu_Y)-\left(\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\right)\right]^2\right\}\)

Now, by the definition of a valid p.d.f., the integral on the left side of the equation equals 1:

And, dealing with the expectation on the right hand side, that is, squaring the term and distributing the expectation, we get:

\(\sigma^2_{Y|X}=E[(Y-\mu_Y)^2]-2\rho \dfrac{\sigma_Y}{\sigma_X}E[(X-\mu_X)(Y-\mu_Y)]+\rho^2\dfrac{\sigma^2_Y}{\sigma^2_X}E[(X-\mu_X)^2]\)

Now, it's just a matter of recognizing various terms on the right-hand side of the equation:

That is:

\(\sigma^2_{Y|X}= \sigma^2_Y-2\rho \dfrac{\sigma_Y}{\sigma_X} \rho \sigma_X \sigma_Y +\rho^2\dfrac{\sigma^2_Y}{\sigma^2_X}\sigma^2_X\)

Simplifying yet more, we get:

\(\sigma^2_{Y|X}= \sigma^2_Y-2\rho^2\sigma^2_Y+\rho^2\sigma^2_Y=\sigma^2_Y-\rho^2\sigma^2_Y\)

And, finally, we get:

\(\sigma^2_{Y|X}= \sigma^2_Y(1-\rho^2)\)

as was to be proved!

So, in summary, our assumptions tell us that the conditional distribution of \(Y\) given \(X=x\) is:

\(Y|X=x\sim N\left(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(X-\mu_X),\quad \sigma^2_Y(1-\rho^2)\right)\)

Now that we have completely defined the conditional distribution of \(Y\) given \(X=x\), we can now use what we already know about the normal distribution to find conditional probabilities, such as \(P(140<Y<160|X=x)\). Let's take a look at an example.

Example 21-1 Section

Let \(X\) denote the math score on the ACT college entrance exam of a randomly selected student. Let \(Y\) denote the verbal score on the ACT college entrance exam of a randomly selected student. Previous history suggests that:

\(X\) is normally distributed with a mean of 22.7 and a variance of 17.64
\(Y\) is normally distributed with a mean of 22.7 and variance of 12.25
The correlation between \(X\) and \(Y\) is 0.78.

What is the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 points?

Solution

Because \(Y\), the verbal ACT score, is assumed to be normally distributed with a mean of 22.7 and a variance of 12.25, calculating the requested probability involves just making a simple normal probability calculation:

Now converting the \(Y\) scores to standard normal \(Z\) scores, we get:

\(P(18.5<Y<25.5)=P\left(\dfrac{18.5-22.7}{\sqrt{12.25}} <Z<\dfrac{25.5-22.7}{\sqrt{12.25}}\right)\)

And, simplifying and looking up the probabilities in the standard normal table in the back of your textbook, we get:

\begin{align} P(18.5<Y<25.5) &= P(-1.20<Z<0.80)\\ &= 0.7881-0.1151=0.6730 \end{align}

That is, the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 points is 0.673.

Now, what happens to our probability calculation if we taken into account the student's ACT math score? That is, what is the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 given that his or her ACT math score was 23? That is, what is \(P(18.5<Y<25.5|X=23)\)?

Solution

Before we can do the probability calculation, we first need to fully define the conditional distribution of \(Y\) given \(X=x\):

Now, if we just plug in the values that we know, we can calculate the conditional mean of \(Y\) given \(X=23\):

\(\mu_{Y|23}=22.7+0.78\left(\dfrac{\sqrt{12.25}}{\sqrt{17.64}}\right)(23-22.7)=22.895\)

and the conditional variance of \(Y\) given \(X=x\):

\(\sigma^2_{Y|X}= \sigma^2_Y(1-\rho^2)=12.25(1-0.78^2)=4.7971\)

It is worth noting that \(\sigma^2_{Y|X}\), the conditional variance of \(Y\) given \(X=x\), is much smaller than \(\sigma^2_Y\), the unconditional variance of \(Y\) (12.25). This should make sense, as we have more information about the student. That is, we should expect the verbal ACT scores of all students to span a greater range than the verbal ACT scores of just those students whose math ACT score was 23.

Now, given that a student's math ACT score is 23, we now know that the student's verbal ACT score, \(Y\), is normally distributed with a mean of 22.895 and a variance of 4.7971. Now, calculating the requested probability again involves just making a simple normal probability calculation:

Converting the \(Y\) scores to standard normal \(Z\) scores, we get:

\(P(18.5<Y<25.5|X=23)=P\left(\dfrac{18.5-22.895}{\sqrt{4.7971}} <Z<\dfrac{25.5-22.895}{\sqrt{4.7971}}\right)\)

And, simplifying and looking up the probabilities in the standard normal table in the back of your textbook, we get:

\(P(18.5<Y<25.5|X=23)=P(-2.01<Z<1.19)=0.8830-0.0222=0.8608\)

That is, given that a random selected student's math ACT score is 23, the probability that the student's verbal ACT score is between 18.5 and 25.5 points is 0.8608.