Lesson 22: Functions of One Random Variable
Lesson 22: Functions of One Random VariableOverview
We'll begin our exploration of the distributions of functions of random variables, by focusing on simple functions of one random variable. For example, if \(X\) is a continuous random variable, and we take a function of \(X\), say:
\(Y=u(X)\)
then \(Y\) is also a continuous random variable that has its own probability distribution. We'll learn how to find the probability density function of \(Y\), using two different techniques, namely the distribution function technique and the changeofvariable technique. At first, we'll focus only on onetoone functions. Then, once we have that mastered, we'll learn how to modify the changeofvariable technique to find the probability of a random variable that is derived from a twotoone function. Finally, we'll learn how the inverse of a cumulative distribution function can help us simulate random numbers that follow a particular probability distribution.
Objectives
 To learn how to use the distribution function technique to find the probability distribution of \(Y=u(X)\), a onetoone transformation of a random variable \(X\).
 To learn how to use the changeofvariable technique to find the probability distribution of \(Y=u(X)\), a onetoone transformation of a random variable \(X\).
 To learn how to use the changeofvariable technique to find the probability distribution of \(Y=u(X)\), a twotoone transformation of a random variable \(X\).
 To learn how to use a cumulative distribution function to simulate random numbers that follow a particular probability distribution.
 To understand all of the proofs in the lesson.
 To be able to apply the methods learned in the lesson to new problems.
22.1  Distribution Function Technique
22.1  Distribution Function TechniqueYou might not have been aware of it at the time, but we have already used the distribution function technique at least twice in this course to find the probability density function of a function of a random variable. For example, we used the distribution function technique to show that:
\(Z=\dfrac{X\mu}{\sigma}\)
follows a standard normal distribution when \(X\) is normally distributed with mean \(\mu\) and standard deviation \(\sigma\). And, we used the distribution function technique to show that, when \(Z\) follows the standard normal distribution:
\(Z^2\)
follows the chisquare distribution with 1 degree of freedom. In summary, we used the distribution function technique to find the p.d.f. of the random function \(Y=u(X)\) by:

First, finding the cumulative distribution function:
\(F_Y(y)=P(Y\leq y)\)

Then, differentiating the cumulative distribution function \(F(y)\) to get the probability density function \(f(y)\). That is:
\(f_Y(y)=F'_Y(y)\)
Now that we've officially stated the distribution function technique, let's take a look at a few more examples.
Example 221
Let \(X\) be a continuous random variable with the following probability density function:
\(f(x)=3x^2\)
for \(0<x<1\). What is the probability density function of \(Y=X^2\)?
Solution
If you look at the graph of the function (above and to the right) of \(Y=X^2\), you might note that (1) the function is an increasing function of \(X\), and (2) \(0<y<1\). That noted, let's now use the distribution function technique to find the p.d.f. of \(Y\). First, we find the cumulative distribution function of \(Y\):
Having shown that the cumulative distribution function of \(Y\) is:
\(F_Y(y)=y^{3/2}\)
for \(0<y<1\), we now just need to differentiate \(F(y)\) to get the probability density function \(f(y)\). Doing so, we get:
\(f_Y(y)=F'_Y(y)=\dfrac{3}{2} y^{1/2}\)
for \(0<y<1\). Our calculation is complete! We have successfully used the distribution function technique to find the p.d.f of \(Y\), when \(Y\) was an increasing function of \(X\). (By the way, you might find it reassuring to verify that \(f(y)\) does indeed integrate to 1 over the support of \(y\). In general, that's not a bad thing to check.)
One thing you might note in the last example is that great care was used to subscript the cumulative distribution functions and probability density functions with either an \(X\) or a \(Y\) to indicate to which random variable the functions belonged. For example, in finding the cumulative distribution function of \(Y\), we started with the cumulative distribution function of \(Y\), and ended up with a cumulative distribution function of \(X\)! If we didn't use the subscripts, we would have had a good chance of throwing up our hands and botching the calculation. In short, using subscripts is a good habit to follow!
Example 222
Let \(X\) be a continuous random variable with the following probability density function:
\(f(x)=3(1x)^2\)
for \(0<x<1\). What is the probability density function of \(Y=(1X)^3\) ?
Solution
If you look at the graph of the function (above and to the right) of:
\(Y=(1X)^3\)
you might note that the function is a decreasing function of \(X\), and \(0<y<1\). That noted, let's now use the distribution function technique to find the p.d.f. of \(Y\). First, we find the cumulative distribution function of \(Y\):
Having shown that the cumulative distribution function of \(Y\) is:
\(F_Y(y)=y\)
for \(0<y<1\), we now just need to differentiate \(F(y)\) to get the probability density function \(f(y)\). Doing so, we get:
\(f_Y(y)=F'_Y(y)=1\)
for \(0<y<1\). That is, \(Y\) is a \(U(0,1)\) random variable. (Again, you might find it reassuring to verify that \(f(y)\) does indeed integrate to 1 over the support of \(y\).)
22.2  ChangeofVariable Technique
22.2  ChangeofVariable TechniqueOn the last page, we used the distribution function technique in two different examples. In the first example, the transformation of \(X\) involved an increasing function, while in the second example, the transformation of \(X\) involved a decreasing function. On this page, we'll generalize what we did there first for an increasing function and then for a decreasing function. The generalizations lead to what is called the changeofvariable technique.
Generalization for an Increasing Function
Let \(X\) be a continuous random variable with a generic p.d.f. \(f(x)\) defined over the support \(c_1<x<c_2\). And, let \(Y=u(X)\) be a continuous, increasing function of \(X\) with inverse function \(X=v(Y)\). Here's a picture of what the continuous, increasing function might look like:
The blue curve, of course, represents the continuous and increasing function \(Y=u(X)\). If you put an \(x\)value, such as \(c_1\) and \(c_2\), into the function \(Y=u(X)\), you get a \(y\)value, such as \(u(c_1)\) and \(u(c_2)\). But, because the function is continuous and increasing, an inverse function \(X=v(Y)\) exists. In that case, if you put a \(y\)value into the function \(X=v(Y)\), you get an \(x\)value, such as \(v(y)\).
Okay, now that we have described the scenario, let's derive the distribution function of \(Y\). It is:
\(F_Y(y)=P(Y\leq y)=P(u(X)\leq y)=P(X\leq v(y))=\int_{c_1}^{v(y)} f(x)dx\)
for \(d_1=u(c_1)<y<u(c_2)=d_2\). The first equality holds from the definition of the cumulative distribution function of \(Y\). The second equality holds because \(Y=u(X)\). The third equality holds because, as shown in red on the following graph, for the portion of the function for which \(u(X)\le y\), it is also true that \(X\le v(Y)\):
And, the last equality holds from the definition of probability for a continuous random variable \(X\). Now, we just have to take the derivative of \(F_Y(y)\), the cumulative distribution function of \(Y\), to get \(f_Y(y)\), the probability density function of \(Y\). The Fundamental Theorem of Calculus, in conjunction with the Chain Rule, tells us that the derivative is:
\(f_Y(y)=F'_Y(y)=f_x (v(y))\cdot v'(y)\)
for \(d_1=u(c_1)<y<u(c_2)=d_2\).
Generalization for a Decreasing Function
Let \(X\) be a continuous random variable with a generic p.d.f. \(f(x)\) defined over the support \(c_1<x<c_2\). And, let \(Y=u(X)\) be a continuous, decreasing function of \(X\) with inverse function \(X=v(Y)\). Here's a picture of what the continuous, decreasing function might look like:
The blue curve, of course, represents the continuous and decreasing function \(Y=u(X)\). Again, if you put an \(x\)value, such as \(c_1\) and \(c_2\), into the function \(Y=u(X)\), you get a \(y\)value, such as \(u(c_1)\) and \(u(c_2)\). But, because the function is continuous and decreasing, an inverse function \(X=v(Y)\) exists. In that case, if you put a \(y\)value into the function \(X=v(Y)\), you get an xvalue, such as \(v(y)\).
That said, the distribution function of \(Y\) is then:
\(F_Y(y)=P(Y\leq y)=P(u(X)\leq y)=P(X\geq v(y))=1P(X\leq v(y))=1\int_{c_1}^{v(y)} f(x)dx\)
for \(d_2=u(c_2)<y<u(c_1)=d_1\). The first equality holds from the definition of the cumulative distribution function of \(Y\). The second equality holds because \(Y=u(X)\). The third equality holds because, as shown in red on the following graph, for the portion of the function for which \(u(X)\le y\), it is also true that \(X\ge v(Y)\):
The fourth equality holds from the rule of complementary events. And, the last equality holds from the definition of probability for a continuous random variable \(X\). Now, we just have to take the derivative of \(F_Y(y)\), the cumulative distribution function of \(Y\), to get \(f_Y(y)\), the probability density function of \(Y\). Again, the Fundamental Theorem of Calculus, in conjunction with the Chain Rule, tells us that the derivative is:
\(f_Y(y)=F'_Y(y)=f_x (v(y))\cdot v'(y)\)
for \(d_2=u(c_2)<y<u(c_1)=d_1\). You might be alarmed in that it seems that the p.d.f. \(f(y)\) is negative, but note that the derivative of \(v(y)\) is negative, because \(X=v(Y)\) is a decreasing function in \(Y\). Therefore, the two negatives cancel each other out, and therefore make \(f(y)\) positive.
Phew! We have now derived what is called the changeofvariable technique first for an increasing function and then for a decreasing function. But, continuous, increasing functions and continuous, decreasing functions, by their onetoone nature, are both invertible functions. Let's, once and for all, then write the changeofvariable technique for any generic invertible function.
Definition. Let \(X\) be a continuous random variable with generic probability density function \(f(x)\) defined over the support \(c_1<x<c_2\). And, let \(Y=u(X)\) be an invertible function of \(X\) with inverse function \(X=v(Y)\). Then, using the changeofvariable technique, the probability density function of \(Y\) is:
\(f_Y(y)=f_X(v(y))\times v'(y)\)
defined over the support \(u(c_1)<y<u(c_2)\).
Having summarized the changeofvariable technique, once and for all, let's revisit an example.
Example 221 Continued
Let's return to our example in which \(X\) is a continuous random variable with the following probability density function:
\(f(x)=3x^2\)
for \(0<x<1\). Use the changeofvariable technique to find the probability density function of \(Y=X^2\).
Solution
Note that the function:
\(Y=X^2\)
defined over the interval \(0<x<1\) is an invertible function. The inverse function is:
\(x=v(y)=\sqrt{y}=y^{1/2}\)
for \(0<y<1\). (That range is because, when \(x=0, y=0\); and when \(x=1, y=1\)). Now, taking the derivative of \(v(y)\), we get:
\(v'(y)=\dfrac{1}{2} y^{1/2}\)
Therefore, the changeofvariable technique:
\(f_Y(y)=f_X(v(y))\times v'(y)\)
tells us that the probability density function of \(Y\) is:
\(f_Y(y)=3[y^{1/2}]^2\cdot \dfrac{1}{2} y^{1/2}\)
And, simplifying we get that the probability density function of \(Y\) is:
\(f_Y(y)=\dfrac{3}{2} y^{1/2}\)
for \(0<y<1\). We shouldn't be surprised by this result, as it is the same result that we obtained using the distribution function technique.
Example 222 continued
Let's return to our example in which \(X\) is a continuous random variable with the following probability density function:
\(f(x)=3(1x)^2\)
for \(0<x<1\). Use the changeofvariable technique to find the probability density function of \(Y=(1X)^3\).
Solution
Note that the function:
\(Y=(1X)^3\)
defined over the interval \(0<x<1\) is an invertible function. The inverse function is:
\(x=v(y)=1y^{1/3}\)
for \(0<y<1\). (That range is because, when \(x=0, y=1\); and when \(x=1, y=0\)). Now, taking the derivative of \(v(y)\), we get:
\(v'(y)=\dfrac{1}{3} y^{2/3}\)
Therefore, the changeofvariable technique:
\(f_Y(y)=f_X(v(y))\times v'(y)\)
tells us that the probability density function of \(Y\) is:
\(f_Y(y)=3[1(1y^{1/3})]^2\cdot \dfrac{1}{3} y^{2/3}=3y^{2/3}\cdot \dfrac{1}{3} y^{2/3} \)
And, simplifying we get that the probability density function of Y is:
\(f_Y(y)=1\)
for \(0<y<1\). Again, we shouldn't be surprised by this result, as it is the same result that we obtained using the distribution function technique.
22.3  TwotoOne Functions
22.3  TwotoOne FunctionsYou might have noticed that all of the examples we have looked at so far involved monotonic functions that, because of their onetoone nature, could therefore be inverted. The question naturally arises then as to how we modify the changeofvariable technique in the situation in which the transformation is not monotonic, and therefore not onetoone. That's what we'll explore on this page! We'll start with an example in which the transformation is twotoone. We'll use the distribution function technique to find the p.d.f of the transformed random variable. In so doing, we'll take note of how the changeofvariable technique must be modified to handle the twotoone portion of the transformation. After summarizing the necessary modification to the changeofvariable technique, we'll take a look at another example using the changeofvariable technique.
Example 223
Suppose \(X\) is a continuous random variable with probability density function:
\(f(x)=\dfrac{x^2}{3}\)
for \(1<x<2\). What is the p.d.f. of \(Y=X^2\)?
Solution
First, note that the transformation:
\(Y=X^2\)
is not onetoone over the interval \(1<x<2\):
For example, in the interval \(1<x<1\), if we take the inverse of \(Y=X^2\), we get:
\(X_1=\sqrt{Y}=v_1(Y)\)
for \(1<x<0\), and:
\(X_2=+\sqrt{Y}=v_2(Y)\)
for \(0<x<1\).
As the graph suggests, the transformation is twotoone between when \(0<y<1\), and onetoone when \(1<y<4\). So, let's use the distribution function technique, separately, over each of these ranges. First, consider when \(0<y<1\). In that case:
\(F_Y(y)=P(Y\leq y)=P(X^2 \leq y)=P(\sqrt{y}\leq X \leq \sqrt{y})=F_X(\sqrt{y})F_X(\sqrt{y})\)
The first equality holds by the definition of the cumulative distribution function. The second equality holds because the transformation of interest is \(Y=X^2\). The third equality holds, because when \(X^2\le y\), the random variable \(X\) is between the positive and negative square roots of \(y\). And, the last equality holds again by the definition of the cumulative distribution function. Now, taking the derivative of the cumulative distribution function \(F(y)\), we get (from the Fundamental Theorem of Calculus and the Chain Rule) the probability density function \(f(y)\):
\(f_Y(y)=F'_Y(y)=f_X(\sqrt{y})\cdot \dfrac{1}{2} y^{1/2} + f_X(\sqrt{y})\cdot \dfrac{1}{2} y^{1/2}\)
Using what we know about the probability density function of \(X\):
\(f(x)=\dfrac{x^2}{3}\)
we get:
\(f_Y(y)=\dfrac{(\sqrt{y})^2}{3} \cdot \dfrac{1}{2} y^{1/2}+\dfrac{(\sqrt{y})^2}{3} \cdot \dfrac{1}{2} y^{1/2}\)
And, simplifying, we get:
\(f_Y(y)=\dfrac{1}{6}y^{1/2}+\dfrac{1}{6}y^{1/2}=\dfrac{\sqrt{y}}{3}\)
for \(0<y<1\). Note that it readily becomes apparent that in the case of a twotoone transformation, we need to sum two terms, each of which arises from a onetoone transformation.
So, we've found the p.d.f. of \(Y\) when \(0<y<1\). Now, we have to find the p.d.f. of \(Y\) when \(1<y<4\). In that case:
\(F_Y(y)=P(Y\leq y)=P(X^2 \leq y)=P(X\leq \sqrt{y})=F_X(\sqrt{y})\)
The first equality holds by the definition of the cumulative distribution function. The second equality holds because \(Y=X^2\). The third equality holds, because when \(X^2\le y\), the random variable \(X \le \sqrt{y}\). And, the last equality holds again by the definition of the cumulative distribution function. Now, taking the derivative of the cumulative distribution function \(F(y)\), we get (from the Fundamental Theorem of Calculus and the Chain Rule) the probability density function \(f(y)\):
\(f_Y(y)=F'_Y(y)=f_X(\sqrt{y})\cdot \dfrac{1}{2} y^{1/2}\)
Again, using what we know about the probability density function of \(X\), and simplifying, we get:
\(f_Y(y)=\dfrac{(\sqrt{y})^2}{3} \cdot \dfrac{1}{2} y^{1/2}=\dfrac{\sqrt{y}}{6}\)
for \(1<y<4\).
Now that we've seen how the distribution function technique works when we have a twotoone function, we should now be able to summarize the necessary modifications to the changeofvariable technique.
Generalization
Let \(X\) be a continuous random variable with probability density function \(f(x)\) for \(c_1<x<c_2\).
Let \(Y=u(X)\) be a continuous twotoone function of \(X\), which can be “broken up” into two onetoone invertible functions with:
\(X_1=v_1(Y)\) and \(X_2=v_2(Y)\)

Then, the probability density function for the twotoone portion of \(Y\) is:
\(f_Y(y)=f_X(v_1(y))\cdot v'_1(y)+f_X(v_2(y))\cdot v'_2(y)\)
for the “appropriate support” for \(y\). That is, you have to add the onetoone portions together.

And, the probability density function for the onetoone portion of \(Y\) is, as always:
\(f_Y(y)=f_X(v_2(y))\cdot v'_2(y)\)
for the “appropriate support” for \(y\).
Example 224
Suppose \(X\) is a continuous random variable with that follows the standard normal distribution with, of course, \(\infty<x<\infty\). Use the changeofvariable technique to show that the p.d.f. of \(Y=X^2\) is the chisquare distribution with 1 degree of freedom.
Solution
The transformation \(Y=X^2\) is twotoone over the entire support \(\infty<x<\infty\):
That is, when \(\infty<x<0\), we have:
\(X_1=\sqrt{Y}=v_1(Y)\)
and when \(0<x<\infty\), we have:
\(X_2=+\sqrt{Y}=v_2(Y)\)
Then, the change of variable technique tells us that, over the twotoone portion of the transformation, that is, when \(0<y<\infty\):
\(f_Y(y)=f_X(\sqrt{y})\cdot \left \dfrac{1}{2} y^{1/2}\right+f_X(\sqrt{y})\cdot \left\dfrac{1}{2} y^{1/2}\right\)
Recalling the p.d.f. of the standard normal distribution:
\(f_X(x)=\dfrac{1}{\sqrt{2\pi}} \text{exp}\left[\dfrac{x^2}{2}\right]\)
the p.d.f. of \(Y\) is then:
\(f_Y(y)=\dfrac{1}{\sqrt{2\pi}} \text{exp}\left[\dfrac{(\sqrt{y})^2}{2}\right]\cdot \left\dfrac{1}{2} y^{1/2}\right+\dfrac{1}{\sqrt{2\pi}} \text{exp}\left[\dfrac{(\sqrt{y})^2}{2}\right]\cdot \left\dfrac{1}{2} y^{1/2}\right\)
Adding the terms together, and simplifying a bit, we get:
\(f_Y(y)=2 \dfrac{1}{\sqrt{2\pi}} \text{exp}\left[\dfrac{y}{2}\right]\cdot \dfrac{1}{2} y^{1/2}\)
Crossing out the 2s, recalling that \(\Gamma(1/2)=\sqrt{\pi}\), and rewriting things just a bit, we should be able to recognize that, with \(0<y<\infty\), the probability density function of \(Y\):
\(f_Y(y)=\dfrac{1}{\Gamma(1/2) 2^{1/2}} e^{y/2} y^{1/2}\)
is indeed the p.d.f. of a chisquare random variable with 1 degree of freedom!
22.4  Simulating Observations
22.4  Simulating ObservationsNow that we've learned the mechanics of the distribution function and changeofvariable techniques to find the p.d.f. of a transformation of a random variable, we'll now turn our attention for a few minutes to an application of the distribution function technique. In doing so, we'll learn how statistical software, such as Minitab or SAS, generates (or "simulates") 1000 random numbers that follow a particular probability distribution. More specifically, we'll explore how statistical software simulates, say, 1000 random numbers from an exponential distribution with mean \(\theta=5\).
The Idea
If we take a look at the cumulative distribution function of an exponential random variable with a mean of \(\theta=5\):
the idea might just jump out at us. You might notice that the cumulative distribution function \(F(x)\) is a number (a cumulative probability, in fact!) between 0 and 1. So, one strategy we might use to generate a 1000 numbers following an exponential distribution with a mean of 5 is:
 Generate a \(Y\sim U(0,1)\) random number. That is, generate a number between 0 and 1 such that each number between 0 and 1 is equally likely.
 Then, use the inverse of \(Y=F(x)\) to get a random number \(X=F^{1}(y)\) whose distribution function is \(F(x)\). This is, in fact, illustrated on the graph. If \(F(x)=0.8\), for example, then the inverse \(X\) is about 8.
 Repeat steps 1 and 2 one thousand times.
By looking at the graph, you should get the idea, by using this strategy, that the shape of the distribution function dictates the probability distribution of the resulting \(X\) values. In this case, the steepness of the curve up to about \(F(x)=0.8\) suggests that most of the \(X\) values will be less than 8. That's what the probability density function of an exponential random variable with a mean of 5 suggests should happen:
We can even do the calculation, of course, to illustrate this point. If \(X\) is an exponential random variable with a mean of 5, then:
\(P(X<8)=1P(X>8)=1e^{8/5}=0.80\)
A theorem (naturally!) formalizes our idea of how to simulate random numbers following a particular probability distribution.
Let \(Y\sim U(0,1)\). Let \(F(x)\) have the properties of a distribution function of the continuous type with \(F(a)=0\) and \(F(b)=1\). Suppose that \(F(x)\) is strictly increasing on the support \(a<x<b\), where \(a\) and \(b\) could be \(\infty\) and \(\infty\), respectively. Then, the random variable \(X\) defined by:
\(X=F^{1}(Y)\)
is a continuous random variable with cumulative distribution function \(F(x)\).
Proof.
In order to prove the theorem, we need to show that the cumulative distribution function of \(X\) is \(F(x)\). That is, we need to show:
\(P(X\leq x)=F(x)\)
It turns out that the proof is a oneliner! Here it is:
\(P(X\leq x)=P(F^{1}(Y)\leq x)=P(Y \leq F(x))=F(x)\)
We've set out to prove what we intended, namely that:
\(P(X\leq x)=F(x)\)
Well, okay, maybe some explanation is needed! The first equality in the oneline proof holds, because:
\(X=F^{1}(Y)\)
Then, the second equality holds because of the red portion of this graph:
That is, when:
\(F^{1}(Y)\leq x\)
is true, so is
\(Y \leq F(x)\)
Finally, the last equality holds because it is assumed that \(Y\) is a uniform(0, 1) random variable, and therefore the probability that \(Y\) is less than or equal to some \(y\) is, in fact, \(y\) itself:
\(P(Y\leq y)=F(y)=\int_0^y dt=y\)
That means that the probability that \(Y\) is less than or equal to some \(F(x)\) is, in fact, \(F(x)\) itself:
\(P(Y \leq F(x))=F(x)\)
Our oneline proof is complete!
Example 225
A student randomly draws the following three uniform(0, 1) numbers:
0.2  0.5  0.9 
Use the three uniform(0,1) numbers to generate three random numbers that follow an exponential distribution with mean \(\theta=5\).
Solution
The cumulative distribution function of an exponential random variable with a mean of 5 is:
\(y=F(x)=1e^{x/5}\)
for \(0\le x<\infty\). We need to invert the cumulative distribution function, that is, solve for \(x\), in order to be able to determine the exponential(5) random numbers. Manipulating the above equation a bit, we get:
\(1y=e^{x/5}\)
Then, taking the natural log of both sides, we get:
\(\text{log}(1y)=\dfrac{x}{5}\)
And, multiplying both sides by −5, we get:
\(x=5\text{log}(1y)\)
for \(0<y<1\). Now, it's just a matter of inserting the student's three random U(0,1) numbers into the above equation to get our three exponential(5) random numbers:
 If \(y=0.2\), we get \(x=1.1\)
 If \(y=0.5\), we get \(x=3.5\)
 If \(y=0.9\), we get \(x=11.5\)
We would simply continue the same process — that is, generating \(y\), a random U(0,1) number, inserting y into the above equation, and solving for \(x\) — 997 more times if we wanted to generate 1000 exponential(5) random numbers. Of course, we wouldn't really do it by hand, but rather let statistical software do it for us. At least we now understand how random number generation works!