22.4 - Simulating Observations

Now that we've learned the mechanics of the distribution function and change-of-variable techniques to find the p.d.f. of a transformation of a random variable, we'll now turn our attention for a few minutes to an application of the distribution function technique. In doing so, we'll learn how statistical software, such as Minitab or SAS, generates (or "simulates") 1000 random numbers that follow a particular probability distribution. More specifically, we'll explore how statistical software simulates, say, 1000 random numbers from an exponential distribution with mean \(\theta=5\).

The Idea Section

If we take a look at the cumulative distribution function of an exponential random variable with a mean of \(\theta=5\):

the idea might just jump out at us. You might notice that the cumulative distribution function \(F(x)\) is a number (a cumulative probability, in fact!) between 0 and 1. So, one strategy we might use to generate a 1000 numbers following an exponential distribution with a mean of 5 is:

Generate a \(Y\sim U(0,1)\) random number. That is, generate a number between 0 and 1 such that each number between 0 and 1 is equally likely.
Then, use the inverse of \(Y=F(x)\) to get a random number \(X=F^{-1}(y)\) whose distribution function is \(F(x)\). This is, in fact, illustrated on the graph. If \(F(x)=0.8\), for example, then the inverse \(X\) is about 8.
Repeat steps 1 and 2 one thousand times.

By looking at the graph, you should get the idea, by using this strategy, that the shape of the distribution function dictates the probability distribution of the resulting \(X\) values. In this case, the steepness of the curve up to about \(F(x)=0.8\) suggests that most of the \(X\) values will be less than 8. That's what the probability density function of an exponential random variable with a mean of 5 suggests should happen:

We can even do the calculation, of course, to illustrate this point. If \(X\) is an exponential random variable with a mean of 5, then:

\(P(X<8)=1-P(X>8)=1-e^{-8/5}=0.80\)

A theorem (naturally!) formalizes our idea of how to simulate random numbers following a particular probability distribution.

Theorem

Let \(Y\sim U(0,1)\). Let \(F(x)\) have the properties of a distribution function of the continuous type with \(F(a)=0\) and \(F(b)=1\). Suppose that \(F(x)\) is strictly increasing on the support \(a<x<b\), where \(a\) and \(b\) could be \(-\infty\) and \(\infty\), respectively. Then, the random variable \(X\) defined by:

\(X=F^{-1}(Y)\)

is a continuous random variable with cumulative distribution function \(F(x)\).

Proof.

In order to prove the theorem, we need to show that the cumulative distribution function of \(X\) is \(F(x)\). That is, we need to show:

\(P(X\leq x)=F(x)\)

It turns out that the proof is a one-liner! Here it is:

\(P(X\leq x)=P(F^{-1}(Y)\leq x)=P(Y \leq F(x))=F(x)\)

We've set out to prove what we intended, namely that:

\(P(X\leq x)=F(x)\)

Well, okay, maybe some explanation is needed! The first equality in the one-line proof holds, because:

\(X=F^{-1}(Y)\)

Then, the second equality holds because of the red portion of this graph:

That is, when:

\(F^{-1}(Y)\leq x\)

is true, so is

\(Y \leq F(x)\)

Finally, the last equality holds because it is assumed that \(Y\) is a uniform(0, 1) random variable, and therefore the probability that \(Y\) is less than or equal to some \(y\) is, in fact, \(y\) itself:

\(P(Y\leq y)=F(y)=\int_0^y dt=y\)

That means that the probability that \(Y\) is less than or equal to some \(F(x)\) is, in fact, \(F(x)\) itself:

\(P(Y \leq F(x))=F(x)\)

Our one-line proof is complete!

Example 22-5 Section

A student randomly draws the following three uniform(0, 1) numbers:

0.2

0.5

0.9

Use the three uniform(0,1) numbers to generate three random numbers that follow an exponential distribution with mean \(\theta=5\).

Solution

The cumulative distribution function of an exponential random variable with a mean of 5 is:

\(y=F(x)=1-e^{-x/5}\)

for \(0\le x<\infty\). We need to invert the cumulative distribution function, that is, solve for \(x\), in order to be able to determine the exponential(5) random numbers. Manipulating the above equation a bit, we get:

\(1-y=e^{-x/5}\)

Then, taking the natural log of both sides, we get:

\(\text{log}(1-y)=-\dfrac{x}{5}\)

And, multiplying both sides by −5, we get:

\(x=-5\text{log}(1-y)\)

for \(0<y<1\). Now, it's just a matter of inserting the student's three random U(0,1) numbers into the above equation to get our three exponential(5) random numbers:

If \(y=0.2\), we get \(x=1.1\)
If \(y=0.5\), we get \(x=3.5\)
If \(y=0.9\), we get \(x=11.5\)

We would simply continue the same process — that is, generating \(y\), a random U(0,1) number, inserting y into the above equation, and solving for \(x\) — 997 more times if we wanted to generate 1000 exponential(5) random numbers. Of course, we wouldn't really do it by hand, but rather let statistical software do it for us. At least we now understand how random number generation works!