Published on *STAT 414 / 415* (https://onlinecourses.science.psu.edu/stat414)

We'll begin our exploration of the distributions of functions of random variables, by focusing on simple functions of *one* random variable. For example, if *X *is a continuous random variable, and we take a function of *X*, say:

*Y *= *u*(*X*)

then* Y *is also a continuous random variable that has its own probability distribution. We'll learn how to find the probability density function of *Y,* using two different techniques, namely the **distribution function technique** and the **change-of-variable technique**. At first, we'll focus only on one-to-one functions. Then, once we have that mastered, we'll learn how to modify the change-of-variable technique to find the probability of a random variable that is derived from a two-to-one function. Finally, we'll learn how the inverse of a cumulative distribution function can help us simulate random numbers that follow a particular probability distribution.

- To learn how to use the distribution function technique to find the probability distribution of
*Y*=*U*(*X*), a one-to-one transformation of a random variable*X*. - To learn how to use the change-of-variable technique to find the probability distribution of
*Y*=*U*(*X*), a one-to-one transformation of a random variable*X*. - To learn how to use the change-of-variable technique to find the probability distribution of
*Y*=*U*(*X*), a two-to-one transformation of a random variable*X*. - To learn how to use a cumulative distribution function to simulate random numbers that follow a particular probability distribution.
- To understand all of the proofs in the lesson.
- To be able to apply the methods learned in the lesson to new problems.

You might not have been aware of it at the time, but we have already used the distribution function technique at least twice in this course to find the probability density function of a function of a random variable. For example, we used the distribution function technique to show that:

\(Z=\dfrac{X-\mu}{\sigma}\)

follows a standard normal distribution when *X* is normally distributed with mean *μ* and standard deviation *σ*. And, we used the distribution function technique to show that, when *Z* follows the standard normal distribution:

\(Z^2\)

follows the chi-square distribution with 1 degree of freedom. In summary, we used the **distribution function technique** to find the p.d.f. of the random function *Y *= *U*(*X*) by:

(1) First, finding the cumulative distribution function:

\(F_Y(y)=P(Y\leq y)\)

(2) Then, differentiating the cumulative distribution function *F*(*y*) to get the probability density function *f*(*y*). That is:

\(f_Y(y)=F'_Y(y)\)

Now that we've officially stated the distribution function technique, let's take a look at a few more examples.

Let *X* be a continuous random variable with the following probability density function:

\(f(x)=3x^2\)

for 0 < *x* < 1. What is the probability density function of \(Y=X^2\)?

**Solution.** If you look at the graph of the function (above and to the right) of *\(Y=X^2\), you might note that (1) the function is an increasing function of X*, and (2) 0 <

Having shown that the cumulative distribution function of *Y* is:

\(F_Y(y)=y^{3/2}\)

for 0 < *y* < 1, we now just need to differentiate *F*(*y*) to get the probability density function *f*(*y*). Doing so, we get:

\(f_Y(y)=F'_Y(y)=\dfrac{3}{2} y^{1/2}\)

for 0 < *y* < 1. Our calculation is complete! We have successfully used the distribution function technique to find the p.d.f of *Y*, when *Y* was an increasing function of *X*. (By the way, you might find it reassuring to verify that *f*(*y*) does indeed integrate to 1 over the support of *y*. In general, that's not a bad thing to check.)

One thing you might note in the last example is that great care was used to subscript the cumulative distribution functions and probability density functions with either an *X* or a *Y* to indicate to which random variable the functions belonged. For example, in finding the cumulative distribution function of *Y*, we started with the cumulative distribution function of *Y*, and ended up with a cumulative distribution function of *X*! If we didn't use the subscripts, we would have had a good chance of throwing up our hands and botching the calculation. In short, using subscripts is a good habit to follow!

Let *X* be a continuous random variable with the following probability density function:

\(f(x)=3(1-x)^2\)

for 0 < *x* < 1. What is the probability density function of \(Y=(1-X)^3\) ?

**Solution.** If you look at the graph of the function (above and to the right) of:

\(Y=(1-X)^3\)

*you might note that (1) the function is a decreasing function of X*, and(2) 0 <

Having shown that the cumulative distribution function of *Y* is:

\(F_Y(y)=y\)

for 0 < *y* < 1, we now just need to differentiate *F*(*y*) to get the probability density function *f*(*y*). Doing so, we get:

\(f_Y(y)=F'_Y(y)=1\)

for 0 < *y* < 1. That is, *Y* is a *U*(0,1) random variable. (Again, you might find it reassuring to verify that *f*(*y*) does indeed integrate to 1 over the support of *y*.)

On the last page, we used the distribution function technique in two different examples. In the first example, the transformation of *X* involved an increasing function, while in the second example, the transformation of *X* involved a decreasing function. On this page, we'll generalize what we did there first for an increasing function and then for a decreasing function. The generalizations lead to what is called the **change-of-variable technique**.

Let *X* be a continuous random variable with a generic p.d.f.* f *(*x*) defined over the support *c*_{1} < *x* < *c*_{2}. And, let *Y *= *u*(*X*) be a *continuous*, *increasing **function* of *X *with inverse function *X *= *v*(*Y*). Here's a picture of what the continuous, increasing function might look like:

The blue curve, of course, represents the continuous and increasing function *Y* = *u*(*X*). If you put an *x*-value, such as *c*_{1} and *c*_{2}, into the function *Y* = *u*(*X*), you get a *y*-value, such as *u*(*c*_{1}) and *u*(*c*_{2}). But, because the function is continuous and increasing, an inverse function *X *= *v*(*Y*) exists. In that case, if you put a *y*-value into the function *X* = *v*(*Y*), you get an *x*-value, such as *v*(*y*).

Okay, now that we have described the scenario, let's derive the distribution function of *Y. *It is:

\(F_Y(y)=P(Y\leq y)=P(u(X)\leq y)=P(X\leq v(y))=\int_{c_1}^{v(y)} f(x)dx\)

for *d*_{1} = *u*(*c*_{1}) < *y* < *u*(*c*_{2}) = *d*_{2}. The first equality holds from the definition of the cumulative distribution function of *Y*. The second equality holds because *Y* = *u*(*X*). The third equality holds because, as shown in red on the following graph, for the portion of the function for which *u*(*X*) ≤ *y*, it is also true that *X* ≤ *v*(*Y*):

And, the last equality holds from the definition of probability for a continuous random variable *X*. Now, we just have to take the derivative of *F*_{Y}(*y*), the cumulative distribution function of *Y, *to get *f _{Y}*(

\(f_Y(y)=F'_Y(y)=f_x (v(y))\cdot v'(y)\)

for *d*_{1} = *u*(*c*_{1}) < *y* < *u*(*c*_{2}) = *d*_{2}.

Let *X* be a continuous random variable with a generic p.d.f.* f *(*x*) defined over the support *c*_{1} < *x* < *c*_{2}. And, let *Y *= *u*(*X*) be a *continuous*, *decreasing **function* of *X *with inverse function *X *= *v*(*Y*). Here's a picture of what the continuous, decreasing function might look like:

The blue curve, of course, represents the continuous and decreasing function *Y* = *u*(*X*). Again, if you put an *x*-value, such as *c*_{1} and *c*_{2}, into the function *Y* = *u*(*X*), you get a *y*-value, such as *u*(*c*_{1}) and *u*(*c*_{2}). But, because the function is continuous and decreasing, an inverse function *X *= *v*(*Y*) exists. In that case, if you put a *y*-value into the function *X* = *v*(*Y*), you get an *x*-value, such as *v*(*y*).

That said, the distribution function of *Y *is then:

\(F_Y(y)=P(Y\leq y)=P(u(X)\leq y)=P(X\geq v(y))=1-P(X\leq v(y))=1-\int_{c_1}^{v(y)} f(x)dx\)

for *d*_{2} = *u*(*c*_{2}) < *y* < *u*(*c*_{1}) = *d*_{1}. The first equality holds from the definition of the cumulative distribution function of *Y*. The second equality holds because *Y* = *u*(*X*). The third equality holds because, as shown in red on the following graph, for the portion of the function for which *u*(*X*) ≤ *y*, it is also true that *X* ≥ *v*(*Y*):

The fourth equality holds from the rule of complementary events. And, the last equality holds from the definition of probability for a continuous random variable *X*. Now, we just have to take the derivative of *F*_{Y}(*y*), the cumulative distribution function of *Y, *to get *f _{Y}*(

\(f_Y(y)=F'_Y(y)=-f_x (v(y))\cdot v'(y)\)

for *d*_{2} = *u*(*c*_{2}) < *y* < *u*(*c*_{1}) = *d*_{1}. You might be alarmed in that it seems that the p.d.f. *f*(*y*) is negative, but note that the derivative of *v*(*y*) is negative, because *X* = *v*(*Y*) is a decreasing function in *Y*. Therefore, the two negatives cancel each other out, and therefore make *f*(*y*) positive.

Phew! We have now derived what is called the change-of-variable technique first for an increasing function and then for a decreasing function. But, continuous, increasing functions and continuous, decreasing functions, by their one-to-one nature, are both invertible functions. Let's, once and for all, then write the change-of-variable technique for any generic invertible function.

\(f_Y(y)=f_X(v(y))\times |v'(y)|\) defined over the support |

Having summarized the change-of-variable technique, once and for all, let's revisit an example.

Let's return to our example in which *X* is a continuous random variable with the following probability density function:

\(f(x)=3x^2\)

for 0 < *x* < 1. Use the change-of-variable technique to find the probability density function of *\(Y=X^2\). *

**Solution.** Note that the function:

*\(Y=X^2\)*

defined over the interval 0 < *x* < 1 is an invertible function. The inverse function is:

\(x=v(y)=\sqrt{y}=y^{1/2}\)

for 0 < *y* < 1. (That range is because, when *x* = 0, *y* = 0; and when *x* = 1, *y* = 1). Now, taking the derivative of *v*(*y*), we get:

\(v'(y)=\dfrac{1}{2} y^{-1/2}\)

Therefore, the change-of-variable technique:

\(f_Y(y)=f_X(v(y))\times |v'(y)|\)

tells us that the probability density function of *Y* is:

\(f_Y(y)=3[y^{1/2}]^2\cdot \dfrac{1}{2} y^{-1/2}\)

And, simplifying we get that the probability density function of *Y* is:

\(f_Y(y)=\dfrac{3}{2} y^{1/2}\)

for 0 < *y* < 1. We shouldn't be surprised by this result, as it is the same result that we obtained using the distribution function technique.

Let's return to our example in which *X* is a continuous random variable with the following probability density function:

\(f(x)=3(1-x)^2\)

for 0 < *x* < 1. Use the change-of-variable technique to find the probability density function of *\(Y=(1-X)^3\).*

**Solution.** Note that the function:

*\(Y=(1-X)^3\)*

defined over the interval 0 < *x* < 1 is an invertible function. The inverse function is:

\(x=v(y)=1-y^{1/3}\)

for 0 < *y* < 1. (That range is because, when *x* = 0, *y* = 1; and when *x* = 1, *y* = 0). Now, taking the derivative of *v*(*y*), we get:

\(v'(y)=-\dfrac{1}{3} y^{-2/3}\)

Therefore, the change-of-variable technique:

\(f_Y(y)=f_X(v(y))\times |v'(y)|\)

tells us that the probability density function of *Y* is:

\(f_Y(y)=3[1-(1-y^{1/3})]^2\cdot |-\dfrac{1}{3} y^{-2/3}|=3y^{2/3}\cdot \dfrac{1}{3} y^{-2/3} \)

And, simplifying we get that the probability density function of *Y* is:

\(f_Y(y)=1\)

for 0 < *y* < 1. Again, we shouldn't be surprised by this result, as it is the same result that we obtained using the distribution function technique.

You might have noticed that all of the examples we have looked at so far involved monotonic functions that, because of their one-to-one nature, could therefore be inverted. The question naturally arises then as to how we modify the change-of-variable technique in the situation in which the transformation is not monotonic, and therefore not one-to-one. That's what we'll explore on this page! We'll start with an example in which the transformation is two-to-one. We'll use the distribution function technique to find the p.d.f of the transformed random variable. In so doing, we'll take note of how the change-of-variable technique must be modified to handle the two-to-one portion of the transformation. After summarizing the necessary modification to the change-of-variable technique, we'll take a look at another example using the change-of-variable technique.

Suppose *X* is a continuous random variable with probability density function:

\(f(x)=\dfrac{x^2}{3}\)

for −1 < *x* < 2. What is the p.d.f. of \(Y=X^2\)?

**Solution.** First, note that the transformation:

\(Y=X^2\)

is not one-to-one over the interval −1 < *x* < 2:

For example, in the interval −1 < *x* < 1, if we take the inverse of \(Y=X^2\), we get:

\(X_1=-\sqrt{Y}=v_1(Y)\)

for −1 < *x* < 0, and:

\(X_2=+\sqrt{Y}=v_2(Y)\)

for 0 < *x* < 1.

As the graph suggests, the transformation is two-to-one between when 0 < *y* < 1, and one-to-one when 1 < *y* < 4. So, let's use the distribution function technique, separately, over each of these ranges. First, consider when **0 < ****y**** < 1**. In that case:

\(F_Y(y)=P(Y\leq y)=P(X^2 \leq y)=P(-\sqrt{y}\leq X \leq \sqrt{y})=F_X(\sqrt{y})-F_X(-\sqrt{y})\)

The first equality holds by the definition of the cumulative distribution function. The second equality holds because the transformation of interest is *Y* = *X*^{2}. The third equality holds, because when *X*^{2} ≤ *y*, the random variable *X* is between the positive and negative square roots of *y*. And, the last equality holds again by the definition of the cumulative distribution function. Now, taking the derivative of the cumulative distribution function *F*(*y*), we get (from the Fundamental Theorem of Calculus and the Chain Rule) the probability density function *f*(*y*):

\(f_Y(y)=F'_Y(y)=f_X(\sqrt{y})\cdot \dfrac{1}{2} y^{-1/2} + f_X(-\sqrt{y})\cdot \dfrac{1}{2} y^{-1/2}\)

Using what we know about the probability density function of *X*:

\(f(x)=\dfrac{x^2}{3}\)

we get:

\(f_Y(y)=\dfrac{(\sqrt{y})^2}{3} \cdot \dfrac{1}{2} y^{-1/2}+\dfrac{(-\sqrt{y})^2}{3} \cdot \dfrac{1}{2} y^{-1/2}\)

And, simplifying, we get:

\(f_Y(y)=\dfrac{1}{6}y^{1/2}+\dfrac{1}{6}y^{1/2}=\dfrac{\sqrt{y}}{3}\)

for 0 < *y* < 1. Note that it readily becomes apparent that in the case of a two-to-one transformation, we need to sum two terms, each of which arises from a one-to-one transformation.

So, we've found the p.d.f. of *Y* when 0 < *y* < 1. Now, we have to find the p.d.f. of *Y* when **1 < y < 4**. In that case:

\(F_Y(y)=P(Y\leq y)=P(X^2 \leq y)=P(X\leq \sqrt{y})=F_X(\sqrt{y})\)

The first equality holds by the definition of the cumulative distribution function. The second equality holds because *Y* = *X*^{2}. The third equality holds, because when *X*^{2} ≤ *y*, the random variable \(X \le \sqrt{y}\). And, the last equality holds again by the definition of the cumulative distribution function. Now, taking the derivative of the cumulative distribution function *F*(*y*), we get (from the Fundamental Theorem of Calculus and the Chain Rule) the probability density function *f*(*y*):

\(f_Y(y)=F'_Y(y)=f_X(\sqrt{y})\cdot \dfrac{1}{2} y^{-1/2}\)

Again, using what we know about the probability density function of *X*, and simplifying, we get:

\(f_Y(y)=\dfrac{(\sqrt{y})^2}{3} \cdot \dfrac{1}{2} y^{-1/2}=\dfrac{\sqrt{y}}{6}\)

for 1 < *y* < 4.

Now that we've seen how the distribution function technique works when we have a two-to-one function, we should now be able to summarize the necessary modifications to the change-of-variable technique.

## Generalization.
Let \(X_1=v_1(Y)\) and \(X_2=v_2(Y)\) (1) Then, the probability density function for the two-to-one portion of
for the “appropriate support” for (2) And, the probability density function for the one-to-one portion of
for the “appropriate support” for |

Suppose *X* is a continuous random variable with that follows the standard normal distribution with, of course, −∞ < *x* < ∞. Use the change-of-variable technique to show that the p.d.f. of \(Y=X^2\) is the chi-square distribution with 1 degree of freedom.

**Solution.** The transformation \(Y=X^2\) is two-to-one over the entire support −∞ < *x* < ∞:

That is, when −∞ < *x* < 0, we have:

\(X_1=-\sqrt{Y}=v_1(Y)\)

and when 0 < *x* < ∞, we have:

\(X_2=+\sqrt{Y}=v_2(Y)\)

Then, the change of variable technique tells us that, over the two-to-one portion of the transformation, that is, when 0 < *y* < ∞:

\(f_Y(y)=f_X(\sqrt{y})\cdot \left |\dfrac{1}{2} y^{-1/2}\right|+f_X(-\sqrt{y})\cdot \left|-\dfrac{1}{2} y^{-1/2}\right|\)

Recalling the p.d.f. of the standard normal distribution:

\(f_X(x)=\dfrac{1}{\sqrt{2\pi}} \text{exp}\left[-\dfrac{x^2}{2}\right]\)

the p.d.f. of *Y* is then:

\(f_Y(y)=\dfrac{1}{\sqrt{2\pi}} \text{exp}\left[-\dfrac{(\sqrt{y})^2}{2}\right]\cdot \left|\dfrac{1}{2} y^{-1/2}\right|+\dfrac{1}{\sqrt{2\pi}} \text{exp}\left[-\dfrac{(\sqrt{y})^2}{2}\right]\cdot \left|-\dfrac{1}{2} y^{-1/2}\right|\)

Adding the terms together, and simplifying a bit, we get:

\(f_Y(y)=2 \dfrac{1}{\sqrt{2\pi}} \text{exp}\left[-\dfrac{y}{2}\right]\cdot \dfrac{1}{2} y^{-1/2}\)

Crossing out the 2s, recalling that \(\Gamma(1/2)=\sqrt{\pi}\), and rewriting things just a bit, we should be able to recognize that, with 0 < *y* < ∞, the probability density function of *Y*:

\(f_Y(y)=\dfrac{1}{\Gamma(1/2) 2^{1/2}} e^{-y/2} y^{-1/2}\)

is indeed the p.d.f. of a chi-square random variable with 1 degree of freedom!

Now that we've learned the mechanics of the distribution function and change-of-variable techniques to find the p.d.f. of a transformation of a random variable, we'll now turn our attention for a few minutes to an application of the distribution function technique. In doing so, we'll learn how statistical software, such as Minitab or SAS, generates (or "**simulates**") 1000 random numbers that follow a particular probability distribution. More specifically, we'll explore how statistical software simulates, say, 1000 random numbers from an exponential distribution with mean *θ* = 5.

If we take a look at the cumulative distribution function of an exponential random variable with a mean of *θ* = 5:

the idea might just jump out at us. You might notice that the cumulative distribution function *F*(*x*) is a number (a cumulative probability, in fact!) between 0 and 1. So, one strategy we might use to generate a 1000 numbers following an exponential distribution with a mean of 5 is:

(1) Generate a *Y* ~ *U*(0, 1) random number. That is, generate a number between 0 and 1 such that each number between 0 and 1 is equally likely.

(2) Then, use the inverse of *Y* = *F*(*x*) to get a random number *X* = *F*^{-1}(*y*) whose distribution function is *F*(*x*). This is, in fact, illustrated on the graph. If *F*(*x*) = 0.8, for example, then the inverse *X* is about 8.

(3) Repeat steps (1) and (2) one thousand times.

By looking at the graph, you should get the idea, by using this strategy, that the shape of the distribution function dictates the probability distribution of the resulting *X* values. In this case, the steepness of the curve up to about *F*(*x*) = 0.8 suggests that most of the *X* values will be less than 8. That's what the probability density function of an exponential random variable with a mean of 5 suggests should happen:

We can even do the calculation, of course, to illustrate this point. If *X* is an exponential random variable with a mean of 5, then:

\(P(X<8)=1-P(X>8)=1-e^{-8/5}=0.80\)

A theorem (naturally!) formalizes our idea of how to simulate random numbers following a particular probability distribution.

\(X=F^{-1}(Y)\) is a continuous random variable with cumulative distribution function |

**Proof.** In order to prove the theorem, we need to show that the cumulative distribution function of *X* is *F*(*x*). That is, we need to show:

\(P(X\leq x)=F(x)\)

It turns out that the proof is a one-liner! Here it is:

\(P(X\leq x)=P(F^{-1}(Y)\leq x)=P(Y \leq F(x))=F(x)\)

We've set out to prove what we intended, namely that:

\(P(X\leq x)=F(x)\)

Well, okay, maybe some explanation is needed! The first equality in the one-line proof holds, because:

\(X=F^{-1}(Y)\)

Then, the second equality holds because of the red portion of this graph:

That is, when:

\(F^{-1}(Y)\leq x\)

is true, so is

\(Y \leq F(x)\)

Finally, the last equality holds because it is assumed that *Y* is a uniform(0, 1) random variable, and therefore the probability that *Y* is less than or equal to some *y* is, in fact, *y* itself:

\(P(Y\leq y)=F(y)=\int_0^y dt=y\)

That means that the probability that *Y* is less than or equal to some *F*(*x*) is, in fact, *F*(*x*) itself:

\(P(Y \leq F(x))=F(x)\)

Our one-line proof is complete!

A student randomly draws the following three uniform(0, 1) numbers:

0.2 0.5 0.9

Use the three uniform(0,1) numbers to generate three random numbers that follow an exponential distribution with mean *θ* = 5.

**Solution.** The cumulative distribution function of an exponential random variable with a mean of 5 is:

\(y=F(x)=1-e^{-x/5}\)

for 0 ≤ *x* < ∞. We need to invert the cumulative distribution function, that is, solve for *x*, in order to be able to determine the exponential(5) random numbers. Manipulating the above equation a bit, we get:

\(1-y=e^{-x/5}\)

Then, taking the natural log of both sides, we get:

\(\text{log}(1-y)=-\dfrac{x}{5}\)

And, multiplying both sides by −5, we get:

\(x=-5\text{log}(1-y)\)

for 0 < *y* < 1. Now, it's just a matter of inserting the student's three random *U*(0,1) numbers into the above equation to get our three exponential(5) random numbers:

- If
*y*= 0.2, we get*x*= 1.1. - If
*y*= 0.5, we get*x*= 3.5. - If
*y*= 0.9, we get*x*= 11.5.

We would simply continue the same process — that is, generating *y*, a random *U*(0,1) number, inserting *y* into the above equation, and solving for *x * — 997 more times if we wanted to generate 1000 exponential(5) random numbers. Of course, we wouldn't really do it by hand, but rather let statistical software do it for us. At least we now understand how random number generation works!