# Definitions

Now that we've digested the concept of a conditional probability distribution informally, let's now define it formally for discrete random variables *X* and *Y*. Later, we'll extend the definition for continuous random variables *X* and *Y*.

\(g(x|y)=\dfrac{f(x,y)}{f_Y(y)}\qquad \text{provided} f_Y(y)>0\) Similarly, the \(h(y|x)=\dfrac{f(x,y)}{f_X(x)}\qquad \text{provided} f_X(x)>0\) |

Let's get some practice using the definition to find the conditional probability distribution first of *X* given *Y*, and then of *Y* given *X*.

### Example

Let *X* be a discrete random variable with support *S*_{1} = {0, 1}, and let *Y* be a discrete random variable with support *S*_{2} = {0, 1, 2}. Suppose, in tabular form, that *X* and *Y* have the following joint probability distribution *f*(*x*,*y*):

What is the conditional distribution of *X* given *Y*? That is, what is *g*(*x*|*y*)?

**Solution.** Using the formula \(g(x|y)=\dfrac{f(x,y)}{f_Y(y)}\), with *x* = 0 and 1, and *y* = 0, 1, and 2, the conditional distribution of *X *given *Y *is, in tabular form:

For example, the 1/3 in the *x* = 0 and *y* = 0 cell comes from:

That is:

\(g(0|0)=\dfrac{f(0,0)}{f_Y(0)}=\dfrac{1/8}{3/8}=\dfrac{1}{3}\)

And, the 2/3 in the *x* = 1 and *y* = 0 cell comes from:

That is:

\(g(1|0)=\dfrac{f(1,0)}{f_Y(0)}=\dfrac{2/8}{3/8}=\dfrac{2}{3}\)

The remaining conditional probabilities are calculated in a similar way. Note that the conditional probabilities in the *g*(*x*|*y*) table are color-coded as blue when *y* = 0, red when *y* = 1, and green when *y* = 2. That isn't necessary, of course, but rather just a device used to emphasize the concept that the probabilities that *X* takes on a particular value are given for the three different sub-populations defined by the value of *Y*.

Note also that it shouldn't be surprising that for each of the three sub-populations defined by *Y*, if you add up the probabilities that *X* = 0 and *X *= 1, you always get 1. This is just as we would expect if we were adding up the (marginal) probabilities over the support of *X*. It's just that here we have to do it for each sub-population rather than the entire population!

### Example (continued)

Let *X* be a discrete random variable with support *S*_{1} = {0, 1}, and let *Y* be a discrete random variable with support *S*_{2} = {0, 1, 2}. Suppose, in tabular form, that *X* and *Y* have the following joint probability distribution *f*(*x*,*y*):

What is the conditional distribution of *Y* given *X*? That is, what is *h*(*y*|*x*)?

*X*be a discrete random variable with support

*S*

_{1}= {0, 1}, and let

*Y*be a discrete random variable with support

*S*

_{2}= {0, 1, 2}. Suppose, in tabular form, that

*X*and

*Y*have the following joint probability distribution

*f*(

*x*,

*y*):

*Y*given

*X*? That is, what is

*h*(

*y*|

*x*)?

**Solution.** Using the formula \(h(y|x)=\dfrac{f(x,y)}{f_X(x)}\), with *x* = 0 and 1, and *y* = 0, 1, and 2, the conditional distribution of *Y *given *X *is, in tabular form:

For example, the 1/4 in the *x* = 0 and *y* = 0 cell comes from:

That is:

\(h(0|0)=\dfrac{f(0,0)}{f_X(0)}=\dfrac{1/8}{4/8}=\dfrac{1}{4}\)

And, the 2/4 in the *x* = 0 and *y* = 1 cell comes from:

That is:

\(h(1|0)=\dfrac{f(0,1)}{f_X(0)}=\dfrac{2/8}{4/8}=\dfrac{2}{4}\)

And, the 1/4 in the *x* = 0 and *y* = 2 cell comes from:

That is:

\(h(2|0)=\dfrac{f(0,2)}{f_X(0)}=\dfrac{1/8}{4/8}=\dfrac{1}{4}\)

Again, the remaining conditional probabilities are calculated in a similar way. Note that the conditional probabilities in the *h*(*y*|*x*) table are color-coded as blue when *x* = 0 and red when *x* = 1. Again, that isn't necessary, but rather just a device used to emphasize the concept that the probabilities that *Y* takes on a particular value are given for the two different sub-populations defined by the value of *X*.

Note also that it shouldn't be surprising that for each of the two sub-populations defined by *X*, if you add up the probabilities that *Y* = 0, *Y* = 1, and *Y *= 2, you get a total of 1. This is just as we would expect if we were adding up the (marginal) probabilities over the support of *Y*. It's just that here, again, we have to do it for each sub-population rather than the entire population!

Okay, now that we've determined *h*(*y*|*x*), the conditional distribution of *Y* given *X*, and *g*(*x*|*y*), the conditional distribution of *X* given *Y*, you might also want to note that *g*(*x*|*y*) does not equal *h*(*y*|*x*). That is, in general, almost always the case.

So, we've used the definition to find the conditional distribution of *X* given *Y*, as well as the conditional distribution of *Y* given *X*. We should now have enough experience with conditional distributions to believe that the following two statements true:

(1) Conditional distributions are valid probability mass functions in their own right. That is, the conditional probabilities are between 0 and 1, inclusive:

\(0 \leq g(x|y) \leq 1 \qquad \text{and}\qquad 0 \leq h(y|x) \leq 1 \)

and, for each subpopulation, the conditional probabilities sum to 1:

\(\sum\limits_x g(x|y)=1 \qquad \text{and}\qquad \sum\limits_y h(y|x)=1 \)

(2) In general, the conditional distribution of *X* given *Y *does not equal the conditional distribution of *Y* given *X*. That is:

*g*(*x*|*y*) ≠ *h*(*y*|*x*)