Now that we've digested the concept of a conditional probability distribution informally, let's now define it formally for discrete random variables \(X\) and \(Y\). Later, we'll extend the definition for continuous random variables \(X\) and \(Y\).
 Conditional probability mass function of \(X\)

The conditional probability mass function of \(X\), given that \(Y=y\), is defined by:
\(g(xy)=\dfrac{f(x,y)}{f_Y(y)}\qquad \text{provided} f_Y(y)>0\)
Similarly,
 Conditional probability mass function of \(Y\)

The conditional probability mass function of \(Y\), given that \(X=x\), is defined by:
\(h(yx)=\dfrac{f(x,y)}{f_X(x)}\qquad \text{provided} f_X(x)>0\)
Let's get some practice using the definition to find the conditional probability distribution first of \(X\) given \(Y\), and then of \(Y\) given \(X\).
Example 192 Section
Let \(X\) be a discrete random variable with support \(S_1=\{0,1\}\), and let \(X\) be a discrete random variable with support \(S_2=\{0, 1, 2\}\). Suppose, in tabular form, that \(X\) and \(Y\) have the following joint probability distribution \(f(x,y)\):
What is the conditional distribution of \(X\) given \(Y\)? That is, what is \(g(xy)\)?
Solution
Using the formula \(g(xy)=\dfrac{f(x,y)}{f_Y(y)}\), with \(x=0\) and 1, and \(y=0, 1\), and 2, the conditional distribution of \(X\) given \(Y\) is, in tabular form:
For example, the 1/3 in the \(x=0\) and \(y=0\) cell comes from:
That is:
\(g(00)=\dfrac{f(0,0)}{f_Y(0)}=\dfrac{1/8}{3/8}=\dfrac{1}{3}\)
And, the 2/3 in the \(x=1\) and \(y=0\) cell comes from:
That is:
\(g(10)=\dfrac{f(1,0)}{f_Y(0)}=\dfrac{2/8}{3/8}=\dfrac{2}{3}\)
The remaining conditional probabilities are calculated in a similar way. Note that the conditional probabilities in the \(g(xy)\) table are colorcoded as blue when y = 0, red when y = 1, and green when y = 2. That isn't necessary, of course, but rather just a device used to emphasize the concept that the probabilities that \(X\) takes on a particular value are given for the three different subpopulations defined by the value of \(Y\).
Note also that it shouldn't be surprising that for each of the three subpopulations defined by \(Y\), if you add up the probabilities that \(X=0\) and \(X=1\), you always get 1. This is just as we would expect if we were adding up the (marginal) probabilities over the support of \(X\). It's just that here we have to do it for each subpopulation rather than the entire population!
Let \(X\) be a discrete random variable with support \(S_1=\{0,1\}\), and let \(Y\) be a discrete random variable with support \(S_2=\{0, 1, 2\}\). Suppose, in tabular form, that \(X\) and \(Y\) have the following joint probability distribution \(f(x,y)\):
What is the conditional distribution of \(Y\) given \(X\)? That is, what is \(h(yx)\)?
Solution
Using the formula \(h(yx)=\dfrac{f(x,y)}{f_X(x)}\), with \(x=0\) and 1, and \(y=0, 1\), and 2, the conditional distribution of \(Y\) given \(X\) is, in tabular form:
For example, the 1/4 in the \(x=0\) and \(y=0\) cell comes from:
That is:
\(h(00)=\dfrac{f(0,0)}{f_X(0)}=\dfrac{1/8}{4/8}=\dfrac{1}{4}\)
And, the 2/4 in the \(x=0\) and \(y=1\) cell comes from:
That is:
\(h(10)=\dfrac{f(0,1)}{f_X(0)}=\dfrac{2/8}{4/8}=\dfrac{2}{4}\)
And, the 1/4 in the \(x=0\) and \(y=2\) cell comes from:
That is:
\(h(20)=\dfrac{f(0,2)}{f_X(0)}=\dfrac{1/8}{4/8}=\dfrac{1}{4}\)
Again, the remaining conditional probabilities are calculated in a similar way. Note that the conditional probabilities in the \(h(yx)\) table are colorcoded as blue when x = 0 and red when x = 1. Again, that isn't necessary, but rather just a device used to emphasize the concept that the probabilities that \(Y\) takes on a particular value are given for the two different subpopulations defined by the value of \(X\).
Note also that it shouldn't be surprising that for each of the two subpopulations defined by \(X\), if you add up the probabilities that \(Y=0\), \(Y=1\), and \(Y=2\), you get a total of 1. This is just as we would expect if we were adding up the (marginal) probabilities over the support of \(Y\). It's just that here, again, we have to do it for each subpopulation rather than the entire population!
Okay, now that we've determined \(h(yx)\), the conditional distribution of \(Y\) given \(X\), and \(g(xy)\), the conditional distribution of \(X\) given \(Y\), you might also want to note that \(g(xy)\) does not equal \(h(yx)\). That is, in general, almost always the case.
So, we've used the definition to find the conditional distribution of \(X\) given \(Y\), as well as the conditional distribution of \(Y\) given \(X\). We should now have enough experience with conditional distributions to believe that the following two statements true:

Conditional distributions are valid probability mass functions in their own right. That is, the conditional probabilities are between 0 and 1, inclusive:
\(0 \leq g(xy) \leq 1 \qquad \text{and}\qquad 0 \leq h(yx) \leq 1 \)
and, for each subpopulation, the conditional probabilities sum to 1:
\(\sum\limits_x g(xy)=1 \qquad \text{and}\qquad \sum\limits_y h(yx)=1 \)

In general, the conditional distribution of \(X\) given \(Y\) does not equal the conditional distribution of \(Y\) given \(X\). That is:
\(g(xy)\ne h(yx)\)