This something that some students have a lot of trouble with the first time they learn it. Here is the example that we were working with previously
Example
X and Y are continuous random variables and we knew that the joint pmf for X and Y is:
fX,Y(x, y) = 24xy
where x ∈ (0, 1) and y ∈ (0, 1-x)
What we wanted to find was:
We have to be careful about how we do this. The first thing that we want to do is find a marginal density for Y by taking the joint and dividing it by the marginal for X:
where y ∈ (0, 1-x)
We already know what the numerator is so we will work to find the denominator. Here is what we had found earlier.
, x ∈ (0, 1)
Now all we need to do is to write both of these in terms of the conditional probability:
, where y ∈ (0, 1-x)
Here we have written this out in a very general fashion. Next, we can specify x = 1/3. Let's do this by plugging this in:
, where y ∈ (0, 2/3)
Notice how the support is crossed over. It changes according to x.
You can to check to see if this is a pdf. Can you do this in your head?
This is a legitimate pdf and it is actually the conditional probability.
Now that we have solved this problem the rest is pretty easy. Now, all we have to do is find the following:
We have solved this problem by computing the conditional probability.
Now we will work with an alternative or short-cut approach that is slightly confusing at first, but should make sense. The idea is that we will look at fX,Y(x, y) but in general we will look at the terms that are decomposed into two pieces such that:
fX,Y(x, y) = g(y) × h(x)
Where g(y) will be all the terms involving y and h(x) are all the terms not involving y. We are purposely saying 'not involving y' instead of just saying x because we will also have this absorb all of the other things.
What we are interested in doing is finding out the conditional using this short-cut. We can begin by working with the definition of the conditional as the joint divided by the marginal of X:
We get the marginal of x from the joint by integrating over y. Then, the h(x) terms cancel each other out.
The denominator, ∫g(y)dy , is also called the 'normalizing constant'. The normalizing constant is informally, "the thing that makes the density integrate or normalize to 1".
Being a constant that makes this integrate to 1, so for convenience we can now call c this normalizing constant so that this now equals c g(y). It turns out that this is our short cut.
For instance, in our example:
fX,Y(x, y) = 24xy , where x ∈ (0, 1) and y ∈ (0, 1-x)
If we want to find fY|X (y|x) using the short cut we start by thinking about what the g(y) and h(x) would be as follows:
, where x ∈ (0, 1) and y ∈ (0, 1-x).
We can now use c as the normalizing constant × the y term.
So what is my new problem now? It is finding the normalizing constant c. I think that you will find that finding c is a relatively easy problem.
Find c:
We are trying to find a conditional distribution, and we found our normalizing constant so let's plug this in.
, where y ∈ (0, 1-x).
What we can do now is for f(y | x = 1/3) we can write it down as follows:
We can go back and see that we got the same thing using the more elaborate method.
It is important to go through this process very carefully the first couple of times that you use the normalizing constant to make sure that you utilize it correctly.
Normalizing Constants - Review
Let's review just a bit here. What is the normalizing constant again? You know that f(x) is a pdf. Also, you also know that .
Suppose you are given f(x) = c h(x), and you know h(x) but you don't know c. It is easy to find c because:
We can also use this now for the other conditional probability f(x | y).
In this case we want to hone in on the x's because the y is fixed. We can do what we did before and solve for c.
Important Note:
You should be able to see that the joint distribution specifies all marginal distributions and now we know that is also specifies all conditional distributions, i.e.,
fX,Y(x, y) this gives us fX(x) and fY(y)
and it also gives us fX|Y(x | y) and fY|X(y | x).
The joint distribution can therefore be used to find marginal means, marginal variances, conditional means and conditional variances.
Here is an example of this continuing with the previous example.
If you had to find the expectation of Y given X = x we could do this. In other words, if you tell me what X is, we can tell you what the expected value of Y is, E(Y | X = x). We do this with the following:
Then we can plug in and solve:
Now, if we wanted to answer the more specific question E(Y | X = 1/3)
The point here is that once you have the joint distribution you can find all kinds of complicated probabilities.