Lesson 21: Bivariate Normal Distributions
Lesson 21: Bivariate Normal DistributionsOverview
Let the random variable \(Y\) denote the weight of a randomly selected individual, in pounds. Then, suppose we are interested in determining the probability that a randomly selected individual weighs between 140 and 160 pounds. That is, what is \(P(140<Y<160)\)?
But, if we think about it, we could imagine that the weight of an individual increases (linearly?) as height increases. If that's the case, in calculating the probability that a randomly selected individual weighs between 140 and 160 pounds, we might find it more informative to first take into account a person's height, say \(X\). That is, we might want to find instead \(P(140<Y<160X=x)\). To calculate such a conditional probability, we clearly first need to find the conditional distribution of \(Y\) given \(X=x\). That's what we'll do in this lesson, that is, after first making a few assumptions.
First, we'll assume that (1) \(Y\) follows a normal distribution, (2) \(E(Yx)\), the conditional mean of \(Y\) given \(x\) is linear in \(x\), and (3) \(\text{Var}(Yx)\), the conditional variance of \(Y\) given \(x\) is constant. Based on these three stated assumptions, we'll find the conditional distribution of \(Y\) given \(X=x\).
Then, to the three assumptions we've already made, we'll then add the assumption that the random variable \(X\) follows a normal distribution, too. Based on the now four stated assumptions, we'll find the joint probability density function of \(X\) and \(Y\).
Objectives
 To find the conditional distribution of \(Y\) given \(X=x\), assuming that (1) \(Y\) follows a normal distribution, (2) \(E(Yx)\), the conditional mean of \(Y\) given \(x\) is linear in \(x\), and (3) \(\text{Var}(Yx)\), the conditional variance of \(Y\) given \(x\) is constant.
 To learn how to calculate conditional probabilities using the resulting conditional distribution.
 To find the joint distribution of \(X\) and \(Y\) assuming that (1) \(X\) follows a normal distribution, (2) \(Y\) follows a normal distribution, (3) \(E(Yx)\), the conditional mean of \(Y\) given \(x\) is linear in \(x\), and (4) \(\text{Var}(Yx)\), the conditional variance of \(Y\) given \(x\) is constant.
 To learn the formal definition of the bivariate normal distribution.
 To understand that when \(X\) and \(Y\) have the bivariate normal distribution with zero correlation, then \(X\) and \(Y\) must be independent.
 To understand each of the proofs provided in the lesson.
 To be able to apply the methods learned in the lesson to new problems.
21.1  Conditional Distribution of Y Given X
21.1  Conditional Distribution of Y Given XLet's start with the assumptions that we stated previously in the introduction to this lesson. That is, let's assume that:
 The continuous random variable \(Y\) follows a normal distribution for each \(x\).

The conditional mean of \(Y\) given \(x\), that is, \(E(Yx)\), is linear in \(x\). Recall that that means, based on our work in the previous lesson, that:
\(E(Yx)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X)\)
 The conditional variance of \(Y\) given \(x\), that is, \(\text{Var}(Yx)=\sigma^2_{YX}\) is constant, that is, the same for each \(x\).
There's a pretty good threedimensional graph in our textbook depicting these assumptions. A twodimensional graph with our height and weight example might look something like this:
The blue line represents the linear relationship between x and the conditional mean of \(Y\) given \(x\). For a given height \(x\), say \(x_1\), the red dots are meant to represent possible weights y for that \(x\) value. Note that the range of red dots is intentionally the same for each \(x\) value. That's because we are assuming that the conditional variance \(\sigma^2_{YX}\) is the same for each \(x\). If we were to turn this twodimensional drawing into a threedimensional drawing, we'd want to draw identical looking normal curves over the top of each set of red dots.
So, in summary, our assumptions tell us so far that the conditional distribution of \(Y\) given \(X=x\) is:
\(Yx \sim N \left(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X),\qquad ??\right)\)
If we could just fill in those question marks, that is, find \(\sigma^2_{YX}\), the conditional variance of \(Y\) given \(x\), then we could use what we already know about the normal distribution to find conditional probabilities, such as \(P(140<Y<160X=x)\). The following theorem does the trick for us.
If the conditional distribution of \(Y\) given \(X=x\) follows a normal distribution with mean \(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X)\) and constant variance \(\sigma^2_{YX}\), then the conditional variance is:
\(\sigma^2_{YX}=\sigma^2_Y(1\rho^2)\)
Proof
Because \(Y\) is a continuous random variable, we need to use the definition of the conditional variance of \(Y\) given \(X=x\) for continuous random variables. That is:
\(\sigma^2_{YX}=Var(Yx)=\int_{\infty}^\infty (y\mu_{YX})^2 h(yx) dy\)
Now, if we replace the \(\mu_{YX}\) in the integrand with what we know it to be, that is, \(E(Yx)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X)\), we get:
\(\sigma^2_{YX}=\int_{\infty}^\infty \left[y\mu_Y\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X)\right]^2 h(yx) dy\)
Then, multiplying both sides of the equation by \(f_X(x)\) and integrating over range of \(x\), we get:
\(\int_{\infty}^\infty \sigma^2_{YX} f_X(x)dx=\int_{\infty}^\infty \int_{\infty}^\infty \left[y\mu_Y\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X)\right]^2 h(yx) f_X(x)dydx\)
Now, on the left side of the equation, since \(\sigma^2_{YX}\) is a constant that doesn't depend on \(x\), we we can pull it through the integral. And, you might recognize that the right side of the equation is an (unconditional) expectation, because:
After pulling the conditional variance through the integral on the left side of the equation, and rewriting the right side of the equation as an expectation, we have:
\(\sigma^2_{YX}\int_{\infty}^\infty f_X(x)dx=E\left\{\left[(Y\mu_Y)\left(\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X)\right)\right]^2\right\}\)
Now, by the definition of a valid p.d.f., the integral on the left side of the equation equals 1:
And, dealing with the expectation on the right hand side, that is, squaring the term and distributing the expectation, we get:
\(\sigma^2_{YX}=E[(Y\mu_Y)^2]2\rho \dfrac{\sigma_Y}{\sigma_X}E[(X\mu_X)(Y\mu_Y)]+\rho^2\dfrac{\sigma^2_Y}{\sigma^2_X}E[(X\mu_X)^2]\)
Now, it's just a matter of recognizing various terms on the righthand side of the equation:
That is:
\(\sigma^2_{YX}= \sigma^2_Y2\rho \dfrac{\sigma_Y}{\sigma_X} \rho \sigma_X \sigma_Y +\rho^2\dfrac{\sigma^2_Y}{\sigma^2_X}\sigma^2_X\)
Simplifying yet more, we get:
\(\sigma^2_{YX}= \sigma^2_Y2\rho^2\sigma^2_Y+\rho^2\sigma^2_Y=\sigma^2_Y\rho^2\sigma^2_Y\)
And, finally, we get:
\(\sigma^2_{YX}= \sigma^2_Y(1\rho^2)\)
as was to be proved!
So, in summary, our assumptions tell us that the conditional distribution of \(Y\) given \(X=x\) is:
\(YX=x\sim N\left(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(X\mu_X),\quad \sigma^2_Y(1\rho^2)\right)\)
Now that we have completely defined the conditional distribution of \(Y\) given \(X=x\), we can now use what we already know about the normal distribution to find conditional probabilities, such as \(P(140<Y<160X=x)\). Let's take a look at an example.
Example 211
Let \(X\) denote the math score on the ACT college entrance exam of a randomly selected student. Let \(Y\) denote the verbal score on the ACT college entrance exam of a randomly selected student. Previous history suggests that:
 \(X\) is normally distributed with a mean of 22.7 and a variance of 17.64
 \(Y\) is normally distributed with a mean of 22.7 and variance of 12.25
 The correlation between \(X\) and \(Y\) is 0.78.
What is the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 points?
Solution
Because \(Y\), the verbal ACT score, is assumed to be normally distributed with a mean of 22.7 and a variance of 12.25, calculating the requested probability involves just making a simple normal probability calculation:
Now converting the \(Y\) scores to standard normal \(Z\) scores, we get:
\(P(18.5<Y<25.5)=P\left(\dfrac{18.522.7}{\sqrt{12.25}} <Z<\dfrac{25.522.7}{\sqrt{12.25}}\right)\)
And, simplifying and looking up the probabilities in the standard normal table in the back of your textbook, we get:
\begin{align} P(18.5<Y<25.5) &= P(1.20<Z<0.80)\\ &= 0.78810.1151=0.6730 \end{align}
That is, the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 points is 0.673.
Now, what happens to our probability calculation if we taken into account the student's ACT math score? That is, what is the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 given that his or her ACT math score was 23? That is, what is \(P(18.5<Y<25.5X=23)\)?
Solution
Before we can do the probability calculation, we first need to fully define the conditional distribution of \(Y\) given \(X=x\):
Now, if we just plug in the values that we know, we can calculate the conditional mean of \(Y\) given \(X=23\):
\(\mu_{Y23}=22.7+0.78\left(\dfrac{\sqrt{12.25}}{\sqrt{17.64}}\right)(2322.7)=22.895\)
and the conditional variance of \(Y\) given \(X=x\):
\(\sigma^2_{YX}= \sigma^2_Y(1\rho^2)=12.25(10.78^2)=4.7971\)
It is worth noting that \(\sigma^2_{YX}\), the conditional variance of \(Y\) given \(X=x\), is much smaller than \(\sigma^2_Y\), the unconditional variance of \(Y\) (12.25). This should make sense, as we have more information about the student. That is, we should expect the verbal ACT scores of all students to span a greater range than the verbal ACT scores of just those students whose math ACT score was 23.
Now, given that a student's math ACT score is 23, we now know that the student's verbal ACT score, \(Y\), is normally distributed with a mean of 22.895 and a variance of 4.7971. Now, calculating the requested probability again involves just making a simple normal probability calculation:
Converting the \(Y\) scores to standard normal \(Z\) scores, we get:
\(P(18.5<Y<25.5X=23)=P\left(\dfrac{18.522.895}{\sqrt{4.7971}} <Z<\dfrac{25.522.895}{\sqrt{4.7971}}\right)\)
And, simplifying and looking up the probabilities in the standard normal table in the back of your textbook, we get:
\(P(18.5<Y<25.5X=23)=P(2.01<Z<1.19)=0.88300.0222=0.8608\)
That is, given that a random selected student's math ACT score is 23, the probability that the student's verbal ACT score is between 18.5 and 25.5 points is 0.8608.
21.2  Joint P.D.F. of X and Y
21.2  Joint P.D.F. of X and YWe previously assumed that:
 \(Y\) follows a normal distribution,
 \(E(Yx)\), the conditional mean of \(Y\) given \(x\) is linear in \(x\), and

\(\text{Var}(Yx)\), the conditional variance of \(Y\) given \(x\) is constant.
Based on these three stated assumptions, we found the conditional distribution of \(Y\) given \(X=x\). Now, we'll add a fourth assumption, namely that:
 \(X\) follows a normal distribution.
Based on the four stated assumptions, we will now define the joint probability density function of \(X\) and \(Y\).
Definition. Assume \(X\) is normal, so that the p.d.f. of \(X\) is:
\(f_X(x)=\dfrac{1}{\sigma_X \sqrt{2\pi}} \text{exp}\left[\dfrac{(x\mu_X)^2}{2\sigma^2_X}\right]\)
for \(\infty<x<\infty\). And, assume that the conditional distribution of \(Y\) given \(X=x\) is normal with conditional mean:
\(E(Yx)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x\mu_X)\)
and conditional variance:
\(\sigma^2_{YX}= \sigma^2_Y(1\rho^2)\)
That is, the conditional distribution of \(Y\) given \(X=x\) is:
\begin{align} h(yx) &= \dfrac{1}{\sigma_{YX} \sqrt{2\pi}} \text{exp}\left[\dfrac{(Y\mu_{YX})^2}{2\sigma^2_{YX}}\right]\\ &= \dfrac{1}{\sigma_Y \sqrt{1\rho^2} \sqrt{2\pi}}\text{exp}\left[\dfrac{[y\mu_Y\rho \dfrac{\sigma_Y}{\sigma_X}(X\mu_X)]^2}{2\sigma^2_Y(1\rho^2)}\right],\quad \infty<x<\infty\\ \end{align}
Therefore, the joint probability density function of \(X\) and \(Y\) is:
\(f(x,y)=f_X(x) \cdot h(yx)=\dfrac{1}{2\pi \sigma_X \sigma_Y \sqrt{1\rho^2}} \text{exp}\left[\dfrac{q(x,y)}{2}\right]\)
where:
\(q(x,y)=\left(\dfrac{1}{1\rho^2}\right) \left[\left(\dfrac{X\mu_X}{\sigma_X}\right)^22\rho \left(\dfrac{X\mu_X}{\sigma_X}\right) \left(\dfrac{Y\mu_Y}{\sigma_Y}\right)+\left(\dfrac{Y\mu_Y}{\sigma_Y}\right)^2\right]\)
This joint p.d.f. is called the bivariate normal distribution.
Our textbook has a nice threedimensional graph of a bivariate normal distribution. You might want to take a look at it to get a feel for the shape of the distribution. Now, let's turn our attention to an important property of the correlation coefficient if \(X\) and \(Y\) have a bivariate normal distribution.
If \(X\) and \(Y\) have a bivariate normal distribution with correlation coefficient \(\rho_{XY}\), then \(X\) and \(Y\) are independent if and only if \(\rho_{XY}=0\). That "if and only if" means:
 If \(X\) and \(Y\) are independent, then \(\rho_{XY}=0\)
 If \(\rho_{XY}=0\), then \(X\) and \(Y\) are independent
Recall that the first item is always true. We proved it back in the lesson that addresses the correlation coefficient. We also looked at a counterexample i that lesson that illustrated that item (2) was not necessarily true! Well, now we've just learned a situation in which it is true, that is, when \(X\) and \(Y\) have a bivariate normal distribution. Let's see why item (2) must be true in that case.
Proof
Since we previously proved item (1), our focus here will be in proving item (2). In order to prove that \(X\) and \(Y\) are independent when \(X\) and \(Y\) have the bivariate normal distribution and with zero correlation, we need to show that the bivariate normal density function:
\(f(x,y)=f_X(x)\cdot h(yx)=\dfrac{1}{2\pi \sigma_X \sigma_Y \sqrt{1\rho^2}} \text{exp}\left[\dfrac{q(x,y)}{2}\right]\)
factors into the normal p.d.f of \(X\) and the normal p.d.f. of \(Y\). Well, when \(\rho_{XY}=0\):
\(q(x,y)=\left(\dfrac{1}{10^2}\right) \left[\left(\dfrac{X\mu_X}{\sigma_X}\right)^2+0+\left(\dfrac{Y\mu_Y}{\sigma_Y}\right)^2 \right]\)
which simplifies to:
\(q(x,y)=\left(\dfrac{X\mu_X}{\sigma_X}\right)^2+\left(\dfrac{Y\mu_Y}{\sigma_Y}\right)^2 \)
Substituting this simplified \(q(x,y)\) into the joint p.d.f. of \(X\) and \(Y\), and simplifying, we see that \(f(x,y)\) does indeed factor into the product of \(f(x)\) and \(f(y)\):
\begin{align} f(x,y) &= \dfrac{1}{2\pi \sigma_X \sigma_Y \sqrt{1\rho^2}} \text{exp}\left[\dfrac{1}{2}\left(\dfrac{X\mu_X}{\sigma_X}\right)^2\dfrac{1}{2}\left(\dfrac{Y\mu_Y}{\sigma_Y}\right)^2\right]\\ &= \dfrac{1}{\sigma_X \sqrt{2\pi} \sigma_Y \sqrt{2\pi}}\text{exp}\left[\dfrac{(x\mu_X)^2}{2\sigma_X^2}\right] \text{exp}\left[\dfrac{(y\mu_Y)^2}{2\sigma_Y^2}\right]\\ &= \dfrac{1}{\sigma_X \sqrt{2\pi}}\text{exp}\left[\dfrac{(x\mu_X)^2}{2\sigma_X^2}\right]\cdot \dfrac{1}{\sigma_Y \sqrt{2\pi}}\text{exp}\left[\dfrac{(y\mu_Y)^2}{2\sigma_Y^2}\right]\\ &=f_X(x)\cdot f_Y(y)\\ \end{align}
Because we have shown that:
\(f(x,y)=f_X(x)\cdot f_Y(y)\)
we can conclude, by the definition of independence, that \(X\) and \(Y\) are independent. Our proof is complete.