Section 5: Distributions of Functions of Random Variables
Section 5: Distributions of Functions of Random Variables
As the name of this section suggests, we will now spend some time learning how to find the probability distribution of functions of random variables. For example, we might know the probability density function of
The more important functions of random variables that we'll explore will be those involving random variables that are independent and identically distributed. For example, if
is distributed. We'll first learn how
approximately follows the standard normal distribution. Finally, we'll use the Central Limit Theorem to use the normal distribution to approximate discrete distributions, such as the binomial distribution and the Poisson distribution.
Lesson 22: Functions of One Random Variable
Lesson 22: Functions of One Random VariableOverview
We'll begin our exploration of the distributions of functions of random variables, by focusing on simple functions of one random variable. For example, if
then
Objectives
- To learn how to use the distribution function technique to find the probability distribution of
, a one-to-one transformation of a random variable . - To learn how to use the change-of-variable technique to find the probability distribution of
, a one-to-one transformation of a random variable . - To learn how to use the change-of-variable technique to find the probability distribution of
, a two-to-one transformation of a random variable . - To learn how to use a cumulative distribution function to simulate random numbers that follow a particular probability distribution.
- To understand all of the proofs in the lesson.
- To be able to apply the methods learned in the lesson to new problems.
22.1 - Distribution Function Technique
22.1 - Distribution Function TechniqueYou might not have been aware of it at the time, but we have already used the distribution function technique at least twice in this course to find the probability density function of a function of a random variable. For example, we used the distribution function technique to show that:
follows a standard normal distribution when
follows the chi-square distribution with 1 degree of freedom. In summary, we used the distribution function technique to find the p.d.f. of the random function
-
First, finding the cumulative distribution function:
-
Then, differentiating the cumulative distribution function
to get the probability density function . That is:
Now that we've officially stated the distribution function technique, let's take a look at a few more examples.
Example 22-1
Let
for
Solution
If you look at the graph of the function (above and to the right) of
Having shown that the cumulative distribution function of
for
for
One thing you might note in the last example is that great care was used to subscript the cumulative distribution functions and probability density functions with either an
Example 22-2
Let
for
Solution
If you look at the graph of the function (above and to the right) of:
you might note that the function is a decreasing function of
Having shown that the cumulative distribution function of
for
for
22.2 - Change-of-Variable Technique
22.2 - Change-of-Variable TechniqueOn the last page, we used the distribution function technique in two different examples. In the first example, the transformation of
Generalization for an Increasing Function
Let
The blue curve, of course, represents the continuous and increasing function
Okay, now that we have described the scenario, let's derive the distribution function of
for
And, the last equality holds from the definition of probability for a continuous random variable
for
Generalization for a Decreasing Function
Let
The blue curve, of course, represents the continuous and decreasing function
That said, the distribution function of
for
The fourth equality holds from the rule of complementary events. And, the last equality holds from the definition of probability for a continuous random variable
for
Phew! We have now derived what is called the change-of-variable technique first for an increasing function and then for a decreasing function. But, continuous, increasing functions and continuous, decreasing functions, by their one-to-one nature, are both invertible functions. Let's, once and for all, then write the change-of-variable technique for any generic invertible function.
Definition. Let
defined over the support
Having summarized the change-of-variable technique, once and for all, let's revisit an example.
Example 22-1 Continued
Let's return to our example in which
for
Solution
Note that the function:
defined over the interval
for
Therefore, the change-of-variable technique:
tells us that the probability density function of
And, simplifying we get that the probability density function of
for
Example 22-2 continued
Let's return to our example in which
for
Solution
Note that the function:
defined over the interval
for
Therefore, the change-of-variable technique:
tells us that the probability density function of
And, simplifying we get that the probability density function of Y is:
for
22.3 - Two-to-One Functions
22.3 - Two-to-One FunctionsYou might have noticed that all of the examples we have looked at so far involved monotonic functions that, because of their one-to-one nature, could therefore be inverted. The question naturally arises then as to how we modify the change-of-variable technique in the situation in which the transformation is not monotonic, and therefore not one-to-one. That's what we'll explore on this page! We'll start with an example in which the transformation is two-to-one. We'll use the distribution function technique to find the p.d.f of the transformed random variable. In so doing, we'll take note of how the change-of-variable technique must be modified to handle the two-to-one portion of the transformation. After summarizing the necessary modification to the change-of-variable technique, we'll take a look at another example using the change-of-variable technique.
Example 22-3
Suppose
for
Solution
First, note that the transformation:
is not one-to-one over the interval
For example, in the interval
for
for
As the graph suggests, the transformation is two-to-one between when
The first equality holds by the definition of the cumulative distribution function. The second equality holds because the transformation of interest is
Using what we know about the probability density function of
we get:
And, simplifying, we get:
for
So, we've found the p.d.f. of
The first equality holds by the definition of the cumulative distribution function. The second equality holds because
Again, using what we know about the probability density function of
for
Now that we've seen how the distribution function technique works when we have a two-to-one function, we should now be able to summarize the necessary modifications to the change-of-variable technique.
Generalization
Let
Let
-
Then, the probability density function for the two-to-one portion of
is:for the “appropriate support” for
. That is, you have to add the one-to-one portions together. -
And, the probability density function for the one-to-one portion of
is, as always:for the “appropriate support” for
.
Example 22-4
Suppose
Solution
The transformation
That is, when
and when
Then, the change of variable technique tells us that, over the two-to-one portion of the transformation, that is, when
Recalling the p.d.f. of the standard normal distribution:
the p.d.f. of
Adding the terms together, and simplifying a bit, we get:
Crossing out the 2s, recalling that
is indeed the p.d.f. of a chi-square random variable with 1 degree of freedom!
22.4 - Simulating Observations
22.4 - Simulating ObservationsNow that we've learned the mechanics of the distribution function and change-of-variable techniques to find the p.d.f. of a transformation of a random variable, we'll now turn our attention for a few minutes to an application of the distribution function technique. In doing so, we'll learn how statistical software, such as Minitab or SAS, generates (or "simulates") 1000 random numbers that follow a particular probability distribution. More specifically, we'll explore how statistical software simulates, say, 1000 random numbers from an exponential distribution with mean
The Idea
If we take a look at the cumulative distribution function of an exponential random variable with a mean of
the idea might just jump out at us. You might notice that the cumulative distribution function
- Generate a
random number. That is, generate a number between 0 and 1 such that each number between 0 and 1 is equally likely. - Then, use the inverse of
to get a random number whose distribution function is . This is, in fact, illustrated on the graph. If , for example, then the inverse is about 8. - Repeat steps 1 and 2 one thousand times.
By looking at the graph, you should get the idea, by using this strategy, that the shape of the distribution function dictates the probability distribution of the resulting
We can even do the calculation, of course, to illustrate this point. If
A theorem (naturally!) formalizes our idea of how to simulate random numbers following a particular probability distribution.
Let
is a continuous random variable with cumulative distribution function
Proof.
In order to prove the theorem, we need to show that the cumulative distribution function of
It turns out that the proof is a one-liner! Here it is:
We've set out to prove what we intended, namely that:
Well, okay, maybe some explanation is needed! The first equality in the one-line proof holds, because:
Then, the second equality holds because of the red portion of this graph:
That is, when:
is true, so is
Finally, the last equality holds because it is assumed that
That means that the probability that
Our one-line proof is complete!
Example 22-5
A student randomly draws the following three uniform(0, 1) numbers:
0.2 | 0.5 | 0.9 |
Use the three uniform(0,1) numbers to generate three random numbers that follow an exponential distribution with mean
Solution
The cumulative distribution function of an exponential random variable with a mean of 5 is:
for
Then, taking the natural log of both sides, we get:
And, multiplying both sides by −5, we get:
for
- If
, we get - If
, we get - If
, we get
We would simply continue the same process — that is, generating
Lesson 23: Transformations of Two Random Variables
Lesson 23: Transformations of Two Random VariablesIntroduction
In this lesson, we consider the situation where we have two random variables and we are interested in the joint distribution of two new random variables which are a transformation of the original one. Such a transformation is called a bivariate transformation. We use a generalization of the change of variables technique which we learned in Lesson 22. We provide examples of random variables whose density functions can be derived through a bivariate transformation.
Objectives
- To learn how to use the change-of-variable technique to find the probability distribution of
, a one-to-one transformation of the two random variables and .
23.1 - Change-of-Variables Technique
23.1 - Change-of-Variables TechniqueRecall, that for the univariate (one random variable) situation: Given
Now, suppose
Let
Then, we usually find
The joint pdf
In the above expression,
i.e. it is the determinant of the matrix
Example 23-1
Suppose
The joint pdf is given by
Consider the transformation:
We have
OR
The Jacobian,
So,
Now, we determine the support of
Using the joint pdf, we may find the marginal pdf of
Similarly, we may find the marginal pdf of
Equivalently,
This pdf is known as the double exponential or Laplace pdf.
23.2 - Beta Distribution
23.2 - Beta DistributionLet
We make the following transformation:
The inverse transformation is given by
The Jacobian is
The joint pdf
with support is
It may be shown that the marginal pdf of
23.3 - F Distribution
23.3 - F DistributionWe describe a very useful distribution in Statistics known as the F distribution.
Let
Define the random variable
This time we use the distribution function technique described in lesson 22,
By differentiating the cdf , it can be shown that
A random variable with the pdf
It contains the F-values for various cumulative probabilities
When using this table, it is helpful to note that if a random variable (say,
Illustration
The shape of the F distribution is determined by the degrees of freedom
The lower plot (below histogram) illustrates how the shape of an F distribution changes with the degrees of freedom
Lesson 24: Several Independent Random Variables
Lesson 24: Several Independent Random VariablesIntroduction

In the previous lessons, we explored functions of random variables. We'll do the same in this lesson, too, except here we'll add the requirement that the random variables be independent, and in some cases, identically distributed. Suppose, for example, that we were interested in determining the average weight of the thousands of pumpkins grown on a pumpkin farm. Since we couldn't possibly weigh all of the pumpkins on the farm, we'd want to weigh just a small random sample of pumpkins. If we let:
denote the weight of the first pumpkin sampled denote the weight of the second pumpkin sampled- ...
denote the weight of the pumpkin sampled
then we could imagine calculating the average weight of the sampled pumpkins as:
Now, because the pumpkins were randomly sampled, we wouldn't expect the weight of one pumpkin, say
Objectives
- To get the big picture for the remainder of the course.
- To learn a formal definition of a random sample.
- To learn what i.i.d. means.
- To learn how to find the expectation of a function of
independent random variables. - To learn how to find the expectation of a product of functions of
independent random variables. - To learn how to find the mean and variance of a linear combination of random variables.
- To learn that the expected value of the sample mean is
. - To learn that the variance of the sample mean is
. - To understand all of the proofs presented in the lesson.
- To be able to apply the methods learned in this lesson to new problems.
24.1 - Some Motivation
24.1 - Some MotivationConsider the population of 8 million college students. Suppose we are interested in determining
and use the resulting data to learn about the population of college students. How could we obtain that random sample though? Would it be okay to stand outside a major classroom building on the Penn State campus, such as the Willard Building, and ask random students how far they are from their hometown? Probably not! The average distance for Penn State students probably differs greatly from that of college students attending a school in a major city, such as, say The University of California in Los Angeles (UCLA). We need to use a method that ensures that the sample is representative of all college students in the population, not just a subset of the students. Any method that ensures that our sample is truly random will suffice. The following definition formalizes what makes a sample truly random.
Definition. The random variables
-
the
are independent, and -
the
are identically distributed, that is, each comes from the same distribution with mean and variance .
We say that the
Now, once we've obtained our (truly) random sample, we'll probably want to use the resulting data to calculate the sample mean:
and sample variance:
In Stat 415, we'll learn that the sample mean
- the probability distribution of
and - the theoretical mean of of
and - the theoretical variance of
and
Now, note that
24.2 - Expectations of Functions of Independent Random Variables
24.2 - Expectations of Functions of Independent Random VariablesOne of our primary goals of this lesson is to determine the theoretical mean and variance of the sample mean:
Now, assume the
That's why we'll spend some time on this page learning how to take expectations of functions of independent random variables! A simple example illustrates that we already have a number of techniques sitting in our toolbox ready to help us find the expectation of a sum of independent random variables.
Example 24-1

Suppose we toss a penny three times. Let
then
is a binomial random variable with and is a binomial random variable with and is a binomial random variable with and
What is the mean of
Solution
We can calculate the mean and variance of
-
By recognizing that
is a binomial random variable with and , we can use what know about the mean and variance of a binomial random variable, namely that the mean of is:and the variance of
is:Since sums of independent random variables are not always going to be binomial, this approach won't always work, of course. It would be good to have alternative methods in hand!
-
We could use the linear operator property of expectation. Before doing so, it would be helpful to note that the mean of
is:and the mean of
is:Now, using the property, we get that the mean of
is (thankfully) again :Recall that the second equality comes from the linear operator property of expectation. Now, using the linear operator property of expectation to find the variance of
takes a bit more work. First, we should note that the variance of is:and the variance of
is:Now, we can (thankfully) show again that the variance of
is :Okay, as if two methods aren't enough, we still have one more method we could use.
-
We could use the independence of the two random variables
and , in conjunction with the definition of expected value of as we know it. First, using the binomial formula, note that we can present the probability mass function of in tabular form as:0 1 2 3 And, we can present the probability mass function of
in tabular form as well:0 1 2 Now, recall that if
and are independent random variables, then:We can use this result to help determine
, the probability mass function of . First note that, since is the sum of and , the support of is {0, 1, 2, 3, 4 and 5}. Now, by brute force, we get:The second equality comes from the fact that the only way that
can equal 0 is if and , and the fourth equality comes from the independence of and . We can make a similar calculation to find the probability that :The first equality comes from the fact that there are two (mutually exclusive) ways that
can equal 1, namely if and or if and . The second equality comes from the independence of and . We can make similar calculations to find , and . Once we've done that, we can present the p.m.f. of in tabular form as:0 1 2 3 4 5 Then, it is a straightforward calculation to use the definition of the expected value of a discrete random variable to determine that (again!) the expected value of
is :The variance of
can be calculated similarly. (Do you want to calculate it one more time?!)The following summarizes the method we've used here in calculating the expected value of
:The first equality comes, of course, from the definition of
. The second equality comes from the definition of the expectation of a function of discrete random variables. The third equality comes from the independence of the random variables and . And, the fourth equality comes from the definition of the expected value of , as well as the fact that can be determined by summing the appropriate joint probabilities of and .
The following theorem formally states the third method we used in determining the expected value of
Let
Let the random variable
provided that these summations exist. For continuous random variables, integrals replace the summations.
In the special case that we are looking for the expectation of the product of functions of
That is, the expectation of the product is the product of the expectations.
Proof
For the sake of concreteness, let's assume that the random variables are discrete. Then, the definition of expectation gives us:
Then, since functions that don't depend on the index of the summation signs can get pulled through the summation signs, we have:
Then, by the definition, in the discrete case, of the expected value of
Our proof is complete. If our random variables are instead continuous, the proof would be similar. We would just need to make the obvious change of replacing the summation signs with integrals.
Let's return to our example in which we toss a penny three times, and let
and and
What is the expected value of
Solution
We'll use the fact that the expectation of the product is the product of the expectations:
24.3 - Mean and Variance of Linear Combinations
24.3 - Mean and Variance of Linear CombinationsWe are still working towards finding the theoretical mean and variance of the sample mean:
If we re-write the formula for the sample mean just a bit:
we can see more clearly that the sample mean is a linear combination of the random variables
Example 24-2
A statistics instructor conducted a survey in her class. The instructor was interested in learning how many siblings, on average, the students at Penn State University have? She took a random sample of
The instructor realized though, that if she had asked a different sample of
Hmmm, the instructor thought that was quite a different result from the first sample, so she decided to take yet another sample of
That's enough of this! I think you can probably see where we are going with this example. It is very clear that the values of the sample mean
- probability distribution (called a "sampling distribution"),
- mean, and
- variance.
We are still in the hunt for all three of these items. The next theorem will help move us closer towards finding the mean and variance of the sample mean
Suppose
Then, the mean and variance of the linear combination
and:
respectively.
Proof
Let's start with the proof for the mean first:
Now for the proof for the variance. Starting with the definition of the variance of
Now, substituting what we know about
Because the summation signs have the same index (
And, we can factor out the constants
Now, let's rewrite the squared term as the product of two terms. In doing so, use an index of
Now, let's pull the summation signs together:
Then, by the linear operator property of expectation, we can distribute the expectation:
Now, let's rewrite the variance of
Simplifying then, we get:
And, simplifying yet more using variance notation:
Finally, we have:
as was to be proved.
Example 24-3
Let
Solution
The mean of the sum is:
and the variance of the sum is:
What is the mean and variance of
Solution
The mean of the difference is:
and the variance of the difference is:
That is, the variance of the difference in the two random variables is the same as the variance of the sum of the two random variables.
What is the mean and variance of
Solution
The mean of the linear combination is:
and the variance of the linear combination is:
24.4 - Mean and Variance of Sample Mean
24.4 - Mean and Variance of Sample MeanWe'll finally accomplish what we set out to do in this lesson, namely to determine the theoretical mean and variance of the continuous random variable
Let
Solution
Starting with the definition of the sample mean, we have:
Then, using the linear operator property of expectation, we get:
Now, the
Now, because there are
We have shown that the mean (or expected value, if you prefer) of the sample mean
Let
Solution
Starting with the definition of the sample mean, we have:
Rewriting the term on the right so that it is clear that we have a linear combination of
Then, applying the theorem on the last page, we get:
Now, the
Now, because there are
Our result indicates that as the sample size
24.5 - More Examples
24.5 - More ExamplesOn this page, we'll just take a look at a few examples that use the material and methods we learned about in this lesson.
Example 24-4
If
for
Solution
The fact that
Example 24-5
Let
for
Solution
The only way that the maximum of the
Now, because
The first equality comes from the independence of the
Therefore, the probability that the maximum of the
Lesson 25: The Moment-Generating Function Technique
Lesson 25: The Moment-Generating Function TechniqueOverview
In the previous lesson, we learned that the expected value of the sample mean
Objectives
- To refresh our memory of the uniqueness property of moment-generating functions.
- To learn how to calculate the moment-generating function of a linear combination of
independent random variables. - To learn how to calculate the moment-generating function of a linear combination of
independent and identically distributed random variables. - To learn the additive property of independent chi-square random variables.
- To use the moment-generating function technique to prove the additive property of independent chi-square random variables.
- To understand the steps involved in each of the proofs in the lesson.
- To be able to apply the methods learned in the lesson to new problems.
25.1 - Uniqueness Property of M.G.F.s
25.1 - Uniqueness Property of M.G.F.sRecall that the moment generating function:
uniquely defines the distribution of a random variable. That is, if you can show that the moment generating function of
- To find the moment-generating function of the function of random variables
- To compare the calculated moment-generating function to known moment-generating functions
- If the calculated moment-generating function is the same as some known moment-generating function of
, then the function of the random variables follows the same probability distribution as
Example 25-1

In the previous lesson, we looked at an example that involved tossing a penny three times and letting
denote the number of heads in five tosses. What is the probability distribution of
Solution
We know that:
is a binomial random variable with and is a binomial random variable with and
Therefore, based on what we know of the moment-generating function of a binomial random variable, the moment-generating function of
And, similarly, the moment-generating function of
Now, because
The first equality comes from the definition of the moment-generating function of the random variable
That is,
It seems that we could generalize the way in which we calculated, in the above example, the moment-generating function of
25.2 - M.G.F.s of Linear Combinations
25.2 - M.G.F.s of Linear CombinationsTheorem
If
is:
Proof
The proof is very similar to the calculation we made in the example on the previous page. That is:
The first equality comes from the definition of the moment-generating function of the random variable
While the theorem is useful in its own right, the following corollary is perhaps even more useful when dealing not just with independent random variables, but also random variables that are identically distributed — two characteristics that we get, of course, when we take a random sample.
Corollary
If
- The moment generating function of the linear combination
is . - The moment generating function of the sample mean
is .
Proof
- use the preceding theorem with
for - use the preceding theorem with
for
Example 25-2
Let
What is the distribution of
Solution
The moment-generating function of a gamma random variable
for
for
What is the distribution of the sample mean
Solution
Again, the moment-generating function of a gamma random variable
for
for
25.3 - Sums of Chi-Square Random Variables
25.3 - Sums of Chi-Square Random VariablesWe'll now turn our attention towards applying the theorem and corollary of the previous page to the case in which we have a function involving a sum of independent chi-square random variables. The following theorem is often referred to as the "additive property of independent chi-squares."
Theorem
Let
Then, the sum of the random variables:
follows a chi-square distribution with
Proof
https://www.youtube.com/watch/Cb3b5gFqLRU [7]
We have shown that
as was to be shown.
Theorem
Let
follows a
Proof
Recall that if
That is,
Corollary
If
for
Proof
Recall that:
Therefore:
as was to be proved.
Lesson 26: Random Functions Associated with Normal Distributions
Lesson 26: Random Functions Associated with Normal DistributionsOverview
In the previous lessons, we've been working our way up towards fully defining the probability distribution of the sample mean
Objectives
- To learn the probability distribution of a linear combination of independent normal random variables
. - To learn how to find the probability that a linear combination of independent normal random variables
takes on a certain interval of values. - To learn the sampling distribution of the sample mean when
are a random sample from a normal population with mean and variance . - To use simulation to get a feel for the shape of a probability distribution.
- To learn the sampling distribution of the sample variance when
are a random sample from a normal population with mean and variance . - To learn the formal definition of a
random variable. - To learn the characteristics of Student's
distribution. - To learn how to read a
-table to find -values and probabilities associated with -values. - To understand each of the steps in the proofs in the lesson.
- To be able to apply the methods learned in this lesson to new problems.
26.1 - Sums of Independent Normal Random Variables
26.1 - Sums of Independent Normal Random VariablesWell, we know that one of our goals for this lesson is to find the probability distribution of the sample mean when a random sample is taken from a population whose measurements are normally distributed. Then, let's just get right to the punch line! Well, first we'll work on the probability distribution of a linear combination of independent normal random variables
If
follows the normal distribution:
Proof
We'll use the moment-generating function technique to find the distribution of
Now, recall that if
Therefore, the moment-generating function of
Evaluating the product at each index
Again, using what we know about exponents, and rewriting what we have using summation notation, we get:
Ahaaa! We have just shown that the moment-generating function of
and variance:
Therefore, by the uniqueness property of moment-generating functions,
Example 26-1
Let
Solution
The previous theorem tells us that
What is the distribution of the linear combination
Solution
The previous theorem tells us that
Example 26-2

History suggests that scores on the Math portion of the Standard Achievement Test (SAT) are normally distributed with a mean of 529 and a variance of 5732. History also suggests that scores on the Verbal portion of the SAT are normally distributed with a mean of 474 and a variance of 6368. Select two students at random. Let
Solution
We can find the requested probability by noting that
Then, finding the probability that
That is, the probability that the first student's Math score is greater than the second student's Verbal score is 0.6915.
Example 26-3

Let
Now, let
Selecting bags at random, what is the probability that the sum of three one-pound bags exceeds the weight of one three-pound bag?
Solution
Because the bags are selected at random, we can assume that
That is,
Therefore, finding the probability that
That is, the probability that the sum of three one-pound bags exceeds the weight of one three-pound bag is 0.9830. Hey, if you want more bang for your buck, it looks like you should buy multiple one-pound bags of carrots, as opposed to one three-pound bag!
26.2 - Sampling Distribution of Sample Mean
26.2 - Sampling Distribution of Sample MeanOkay, we finally tackle the probability distribution (also known as the "sampling distribution") of the sample mean when
If
is normally distributed with mean
Proof
The result follows directly from the previous theorem. All we need to do is recognize that the sample mean:
is a linear combination of independent normal random variables:
with
The first equality comes from the theorem on the previous page, about the distribution of a linear combination of independent normal random variables. The second equality comes from simply replacing
The first equality comes from pulling the constants depending on
That is the same as the moment generating function of a normal random variable with mean
Example 26-4
Let
Anwser
In general, the variance of the sample mean is:
Therefore, the variance of the sample mean of the first sample is:
(The subscript 4 is there just to remind us that the sample mean is based on a sample of size 4.) And, the variance of the sample mean of the second sample is:
(The subscript 8 is there just to remind us that the sample mean is based on a sample of size 8.) Now, the corollary therefore tells us that the sample mean of the first sample is normally distributed with mean 100 and variance 64. That is:
And, the sample mean of the second sample is normally distributed with mean 100 and variance 32. That is:
So, we have two, no actually, three normal random variables with the same mean, but difference variances:
- We have
, an IQ of a random individual. It is normally distributed with mean 100 and variance 256. - We have
, the average IQ of 4 random individuals. It is normally distributed with mean 100 and variance 64. - We have
, the average IQ of 8 random individuals. It is normally distributed with mean 100 and variance 32.
It is quite informative to graph these three distributions on the same plot. Doing so, we get:
As the plot suggests, an individual
All the work that we have done so far concerning this example has been theoretical in nature. That is, what we have learned is based on probability theory. Would we see the same kind of result if we were take to a large number of samples, say 1000, of size 4 and 8, and calculate the sample mean of each sample? That is, would the distribution of the 1000 sample means based on a sample of size 4 look like a normal distribution with mean 100 and variance 64? And would the distribution of the 1000 sample means based on a sample of size 8 look like a normal distribution with mean 100 and variance 32? Well, the only way to answer these questions is to try it out!
I did just that for us. I used Minitab to generate 1000 samples of eight random numbers from a normal distribution with mean 100 and variance 256. Here's a subset of the resulting random numbers:
ROW | X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | Mean 4 | Mean 8 |
1 | 87 | 68 | 98 | 114 | 59 | 111 | 114 | 86 | 91.75 | 92.125 |
2 | 102 | 81 | 74 | 110 | 112 | 106 | 105 | 99 | 91.75 | 98.625 |
3 | 96 | 87 | 50 | 88 | 69 | 107 | 94 | 83 | 80.25 | 84.250 |
4 | 83 | 134 | 122 | 80 | 117 | 110 | 115 | 158 | 104.75 | 114.875 |
5 | 92 | 87 | 120 | 93 | 90 | 111 | 95 | 92 | 98.00 | 97.500 |
6 | 139 | 102 | 100 | 103 | 111 | 62 | 78 | 73 | 111.00 | 96.000 |
7 | 134 | 121 | 99 | 118 | 108 | 106 | 103 | 91 | 118.00 | 110.000 |
8 | 126 | 92 | 148 | 131 | 99 | 106 | 143 | 128 | 124.25 | 121.625 |
9 | 98 | 109 | 119 | 110 | 124 | 99 | 119 | 82 | 109.00 | 107.500 |
10 | 85 | 93 | 82 | 106 | 93 | 109 | 100 | 95 | 91.50 | 95.375 |
11 | 121 | 103 | 108 | 96 | 112 | 117 | 93 | 112 | 107.00 | 107.750 |
12 | 118 | 91 | 106 | 108 | 128 | 96 | 65 | 85 | 105.75 | 99.625 |
13 | 92 | 87 | 96 | 81 | 86 | 105 | 91 | 104 | 89.00 | 92.750 |
14 | 94 | 115 | 59 | 105 | 101 | 122 | 97 | 103 | 93.25 | 99.500 |
...and so on... | ||||||||||
975 | 108 | 139 | 130 | 97 | 138 | 88 | 104 | 87 | 118.50 | 111.375 |
976 | 99 | 122 | 93 | 107 | 98 | 62 | 102 | 115 | 105.25 | 99.750 |
977 | 99 | 127 | 91 | 101 | 127 | 79 | 81 | 121 | 104.50 | 103.250 |
978 | 120 | 108 | 101 | 104 | 90 | 90 | 191 | 104 | 108.25 | 101.000 |
979 | 101 | 93 | 106 | 113 | 115 | 82 | 96 | 97 | 103.25 | 100.375 |
980 | 118 | 86 | 74 | 95 | 109 | 111 | 90 | 83 | 93.25 | 95.750 |
981 | 118 | 95 | 121 | 124 | 111 | 90 | 105 | 112 | 114.50 | 109.500 |
982 | 110 | 121 | 85 | 117 | 91 | 84 | 84 | 108 | 108.25 | 100.000 |
983 | 95 | 109 | 118 | 112 | 121 | 105 | 84 | 115 | 108.50 | 107.375 |
984 | 102 | 105 | 127 | 104 | 95 | 101 | 106 | 103 | 109.50 | 105.375 |
985 | 116 | 93 | 112 | 102 | 67 | 92 | 103 | 114 | 105.75 | 99.875 |
986 | 106 | 97 | 114 | 82 | 82 | 108 | 113 | 81 | 99.75 | 97.875 |
987 | 107 | 93 | 78 | 91 | 83 | 81 | 115 | 102 | 92.25 | 93.750 |
988 | 106 | 115 | 105 | 74 | 86 | 124 | 97 | 116 | 100.00 | 102.875 |
989 | 117 | 84 | 131 | 102 | 92 | 118 | 90 | 90 | 108.50 | 103.000 |
990 | 100 | 69 | 108 | 128 | 111 | 110 | 94 | 95 | 101.25 | 101.875 |
991 | 86 | 85 | 123 | 94 | 104 | 89 | 76 | 97 | 97.00 | 94.250 |
992 | 94 | 90 | 72 | 121 | 105 | 150 | 72 | 88 | 94.25 | 99.000 |
993 | 70 | 109 | 104 | 114 | 93 | 103 | 126 | 99 | 99.25 | 102.250 |
994 | 102 | 110 | 98 | 93 | 64 | 131 | 91 | 95 | 100.75 | 98.000 |
995 | 80 | 135 | 120 | 92 | 118 | 119 | 66 | 117 | 106.75 | 105.875 |
996 | 81 | 102 | 88 | 98 | 113 | 81 | 95 | 110 | 92.25 | 96.000 |
997 | 85 | 146 | 73 | 133 | 111 | 88 | 92 | 74 | 109.25 | 100.250 |
998 | 94 | 109 | 110 | 115 | 95 | 93 | 90 | 103 | 107.00 | 101.125 |
999 | 84 | 84 | 97 | 125 | 92 | 89 | 95 | 124 | 97.50 | 98.750 |
1000 | 77 | 60 | 113 | 106 | 107 | 109 | 110 | 103 | 89.00 | 98.125 |
As you can see, the second last column, titled Mean4, is the average of the first four columns X1 X2, X3, and X4. The last column, titled Mean8, is the average of the first eight columns X1, X2, X3, X4, X5, X6, X7, and X8. Now, all we have to do is create a histogram of the sample means appearing in the Mean4 column:
Ahhhh! The histogram sure looks fairly bell-shaped, making the normal distribution a real possibility. Now, recall that the Empirical Rule tells us that we should expect, if the sample means are normally distributed, that almost all of the sample means would fall within three standard deviations of the population mean. That is, in the case of Mean4, we should expect almost all of the data to fall between 76 (from 100−3(8)) and 124 (from 100+3(8)). It sure looks like that's the case!
Let's do the same thing for the Mean8 column. That is, let's create a histogram of the sample means appearing in the Mean8 column. Doing so, we get:
Again, the histogram sure looks fairly bell-shaped, making the normal distribution a real possibility. In this case, the Empirical Rule tells us that, in the case of Mean8, we should expect almost all of the data to fall between 83 (from 100−3(square root of 32)) and 117 (from 100+3(square root of 32)). It too looks pretty good on both sides, although it seems that there were two really extreme sample means of size 8. (If you look back at the data, you can see one of them in the eighth row.)
In summary, the whole point of this exercise was to use the theory to help us derive the distribution of the sample mean of IQs, and then to use real simulated normal data to see if our theory worked in practice. I think we can conclude that it does!
26.3 - Sampling Distribution of Sample Variance
26.3 - Sampling Distribution of Sample VarianceNow that we've got the sampling distribution of the sample mean down, let's turn our attention to finding the sampling distribution of the sample variance. The following theorem will do the trick for us!
are observations of a random sample of size from the normal distribution is the sample mean of the observations, and is the sample variance of the observations.
Then:
and are independent
Proof
The proof of number 1 is quite easy. Errr, actually not! It is quite easy in this course, because it is beyond the scope of the course. So, we'll just have to state it without proof.
Now for proving number 2. This is one of those proofs that you might have to read through twice... perhaps reading it the first time just to see where we're going with it, and then, if necessary, reading it again to capture the details. We're going to start with a function which we'll call
Now, we can take
As you can see, we added 0 by adding and subtracting the sample mean to the quantity in the numerator. Now, let's square the term. Doing just that, and distributing the summation, we get:
But the last term is 0:
so,
We can do a bit more with the first term of
and multiply both sides by
So, the numerator in the first term of
Okay, let's take a break here to see what we have. We've taken the quantity on the left side of the above equation, added 0 to it, and showed that it equals the quantity on the right side. Now, what can we say about each of the terms. Well, the term on the left side of the equation:
is a sum of
follows a standard normal distribution. Now, recall that if we square a standard normal random variable, we get a chi-square random variable with 1 degree of freedom. So, again:
is a sum of
for
is a chi-square(1) random variable. That's because the sample mean is normally distributed with mean
is a standard normal random variable. So, if we square
And therefore the moment-generating function of
for
Now, let's use the uniqueness property of moment-generating functions. By definition, the moment-generating function of
Using what we know about exponents, we can rewrite the term in the expectation as a product of two exponent terms:
The last equality in the above equation comes from the independence between
Now, let's solve for the moment-generating function of
Adding the exponents, we get:
for
as was to be proved! And, to just think that this was the easier of the two proofs
Before we take a look at an example involving simulation, it is worth noting that in the last proof, we proved that, when sampling from a normal distribution:
but:
The only difference between these two summations is that in the first case, we are summing the squared differences from the population mean
Example 26-5
Let's return to our example concerning the IQs of randomly selected individuals. Let
Solution
Because the sample size is
follows a chi-square distribution with 7 degrees of freedom. Here's what the theoretical density function would look like:
Again, all the work that we have done so far concerning this example has been theoretical in nature. That is, what we have learned is based on probability theory. Would we see the same kind of result if we were take to a large number of samples, say 1000, of size 8, and calculate:
for each sample? That is, would the distribution of the 1000 resulting values of the above function look like a chi-square(7) distribution? Again, the only way to answer this question is to try it out! I did just that for us. I used Minitab to generate 1000 samples of eight random numbers from a normal distribution with mean 100 and variance 256. Here's a subset of the resulting random numbers:
click to enlarge [8]
As you can see, the last column, titled FnofSsq (for function of sums of squares), contains the calculated value of:
based on the random numbers generated in columns X1 X2, X3, X4, X5, X6, X7, and X8. For example, given that the average of the eight numbers in the first row is 98.625, the value of FnofSsq in the first row is:
Now, all we have to do is create a histogram of the values appearing in the FnofSsq column. Doing so, we get:
Hmm! The histogram sure looks eerily similar to that of the density curve of a chi-square random variable with 7 degrees of freedom. It looks like the practice is meshing with the theory!
26.4 - Student's t Distribution
26.4 - Student's t DistributionWe have just one more topic to tackle in this lesson, namely, Student's t distribution. Let's just jump right in and define it!
Definition. If
follows a
for
By the way, the
History aside, the above definition is probably not particularly enlightening. Let's try to get a feel for the
and create a histogram of the 1000 resulting
ROW | Z | CHISQ (3) | T(3) |
---|---|---|---|
1 | -2.60481 | 10.2497 | -1.4092 |
2 | 2.92321 | 1.6517 | 3.9396 |
3 | -0.48633 | 0.1757 | -2.0099 |
4 | -0.48212 | 3.8283 | -0.4268 |
5 | -0.04150 | 0.2422 | -0.1461 |
6 | -0.84225 | -0.0903 | -4.8544 |
7 | -0.31205 | 1.6326 | -0.4230 |
8 | 1.33068 | 5.2224 | 1.0086 |
9 | -0.64104 | 0.9401 | -1.1451 |
10 | -0.05110 | 2.2632 | -0.0588 |
11 | 1.61601 | 4.6566 | 1.2971 |
12 | 0.81522 | 2.1738 | 0.9577 |
13 | 0.38501 | 1.8404 | 0.4916 |
14 | -1.63426 | 1.1265 | -2.6669 |
...and so on... | |||
994 | -0.18942 | 3.5202 | -0.1749 |
995 | 0.43078 | 3.3585 | 0.4071 |
996 | -0.14068 | 0.6236 | -0.3085 |
997 | -1.76357 | 2.6188 | -1.8876 |
998 | -1.02310 | 3.2470 | -0.9843 |
999 | -0.93777 | 1.4991 | -1.3266 |
1000 | -0.37665 | 2.1231 | -0.4477 |
Note, for example, in the first row:
Here's what the resulting histogram of the 1000 randomly generated
Hmmm. The
In fact, it looks as if, as the degrees of freedom
- The support appears to be
. (It is!) - The probability distribution appears to be symmetric about
. (It is!) - The probability distribution appears to be bell-shaped. (It is!)
- The density curve looks like a standard normal curve, but the tails of the
-distribution are "heavier" than the tails of the normal distribution. That is, we are more likely to get extreme -values than extreme -values. - As the degrees of freedom
increases, the -distribution appears to approach the standard normal -distribution. (It does!)
As you'll soon see, we'll need to look up
The Table
If you take a look at Table VI in the back of your textbook, you'll find what looks like a typical
P(T≤ t) | |||||||
0.60 | 0.75 | 0.90 | 0.95 | 0.975 | 0.99 | 0.995 | |
r | t0.40(r) | t0.25(r) | t0.10(r) | t0.05(r) | t0.025(r) | t0.01(r) | t0.005(r) |
1 | 0.325 | 1.000 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 |
2 | 0.289 | 0.816 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 |
3 | 0.277 | 0.765 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 |
4 | 0.271 | 0.741 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 |
5 | 0.267 | 0.727 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 |
6 | 0.265 | 0.718 | 1.440 | 1.943 | 2.447 | 3.143 | 3.707 |
7 | 0.263 | 0.711 | 1.415 | 1.895 | 2.365 | 2.998 | 3.499 |
8 | 0.262 | 0.706 | 1.397 | 1.860 | 2.306 | 2.896 | 3.355 |
9 | 0.261 | 0.703 | 1.383 | 1.833 | 2.262 | 2.821 | 3.250 |
10 | 0.260 | 0.700 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 |
The
Let's use the
Let's take a look at a few more examples.
Example 26-6
Let
Solution
The probability calculation is quite similar to a calculation we'd have to make for a normal random variable. First, rewriting the probability in terms of
Then, we have to rewrite the probability in terms of cumulative probabilities that we can actually find, that is:
Pictorially, the probability we are looking for looks something like this:
But the
>
P(T≤ t) | |||||||
0.60 | 0.75 | 0.90 | 0.95 | 0.975 | 0.99 | 0.995 | |
r | t0.40(r) | t0.25(r) | t0.10(r) | t0.05(r) | t0.025(r) | t0.01(r) | t0.005(r) |
1 | 0.325 | 1.000 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 |
2 | 0.289 | 0.816 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 |
3 | 0.277 | 0.765 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 |
4 | 0.271 | 0.741 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 |
5 | 0.267 | 0.727 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 |
6 | 0.265 | 0.718 | 1.440 | 1.943 | 2.447 | 3.143 | 3.707 |
7 | 0.263 | 0.711 | 1.415 | 1.895 | 2.365 | 2.998 | 3.499 |
8 | 0.262 | 0.706 | 1.397 | 1.860 | 2.306 | 2.896 | 3.355 |
9 | 0.261 | 0.703 | 1.383 | 1.833 | 2.262 | 2.821 | 3.250 |
10 | 0.260 | 0.700 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 |
P(T≤ t) | |||||||
0.60 | 0.75 | 0.90 | 0.95 | 0.975 | 0.99 | 0.995 | |
r | t0.40(r) | t0.25(r) | t0.10(r) | t0.05(r) | t0.025(r) | t0.01(r) | t0.005(r) |
1 | 0.325 | 1.000 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 |
2 | 0.289 | 0.816 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 |
3 | 0.277 | 0.765 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 |
4 | 0.271 | 0.741 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 |
5 | 0.267 | 0.727 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 |
6 | 0.265 | 0.718 | 1.440 | 1.943 | 2.447 | 3.143 | 3.707 |
7 | 0.263 | 0.711 | 1.415 | 1.895 | 2.365 | 2.998 | 3.499 |
8 | 0.262 | 0.706 | 1.397 | 1.860 | 2.306 | 2.896 | 3.355 |
9 | 0.261 | 0.703 | 1.383 | 1.833 | 2.262 | 2.821 | 3.250 |
10 | 0.260 | 0.700 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 |
The
What is
Solution
The value
Can you find the value
P(T≤ t) | |||||||
0.60 | 0.75 | 0.90 | 0.95 | 0.975 | 0.99 | 0.995 | |
r | t0.40(r) | t0.25(r) | t0.10(r) | t0.05(r) | t0.025(r) | t0.01(r) | t0.005(r) |
1 | 0.325 | 1.000 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 |
2 | 0.289 | 0.816 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 |
3 | 0.277 | 0.765 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 |
4 | 0.271 | 0.741 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 |
5 | 0.267 | 0.727 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 |
6 | 0.265 | 0.718 | 1.440 | 1.943 | 2.447 | 3.143 | 3.707 |
7 | 0.263 | 0.711 | 1.415 | 1.895 | 2.365 | 2.998 | 3.499 |
8 | 0.262 | 0.706 | 1.397 | 1.860 | 2.306 | 2.896 | 3.355 |
9 | 0.261 | 0.703 | 1.383 | 1.833 | 2.262 | 2.821 | 3.250 |
10 | 0.260 | 0.700 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 |
P(T≤ t) | |||||||
0.60 | 0.75 | 0.90 | 0.95 | 0.975 | 0.99 | 0.995 | |
r | t0.40(r) | t0.25(r) | t0.10(r) | t0.05(r) | t0.025(r) | t0.01(r) | t0.005(r) |
1 | 0.325 | 1.000 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 |
2 | 0.289 | 0.816 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 |
3 | 0.277 | 0.765 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 |
4 | 0.271 | 0.741 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 |
5 | 0.267 | 0.727 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 |
6 | 0.265 | 0.718 | 1.440 | 1.943 | 2.447 | 3.143 | 3.707 |
7 | 0.263 | 0.711 | 1.415 | 1.895 | 2.365 | 2.998 | 3.499 |
8 | 0.262 | 0.706 | 1.397 | 1.860 | 2.306 | 2.896 | 3.355 |
9 | 0.261 | 0.703 | 1.383 | 1.833 | 2.262 | 2.821 | 3.250 |
10 | 0.260 | 0.700 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 |
We have determined that the probability that a
Why will we encounter a random variable?
Given a random sample
Earlier in this lesson, we learned that:
follows a chi-square distribution with
It is the resulting quantity, that is:
that will help us, in Stat 415, to use a mean from a random sample, that is
Lesson 27: The Central Limit Theorem
Lesson 27: The Central Limit TheoremIntroduction
In the previous lesson, we investigated the probability distribution ("sampling distribution") of the sample mean when the random sample
But what happens if the
Objectives
- To learn the Central Limit Theorem.
- To get an intuitive feeling for the Central Limit Theorem.
- To use the Central Limit Theorem to find probabilities concerning the sample mean.
- To be able to apply the methods learned in this lesson to new problems.
27.1 - The Theorem
27.1 - The TheoremCentral Limit Theorem
We don't have the tools yet to prove the Central Limit Theorem, so we'll just go ahead and state it without proof.
Let
-
the sample mean
follows an approximate normal distribution -
with mean
-
and variance
We write:
or:
So, in a nutshell, the Central Limit Theorem (CLT) tells us that the sampling distribution of the sample mean is, at least approximately, normally distributed, regardless of the distribution of the underlying random sample. In fact, the CLT applies regardless of whether the distribution of the
You might be wondering why "sufficiently large" appears in quotes in the theorem. Well, that's because the necessary sample size
- If the distribution of the
is symmetric, unimodal or continuous, then a sample size as small as 4 or 5 yields an adequate approximation. - If the distribution of the
is skewed, then a sample size of at least 25 or 30 yields an adequate approximation. - If the distribution of the
is extremely skewed, then you may need an even larger .
We'll spend the rest of the lesson trying to get an intuitive feel for the theorem, as well as applying the theorem so that we can calculate probabilities concerning the sample mean.
27.2 - Implications in Practice
27.2 - Implications in PracticeAs stated on the previous page, we don't yet have the tools to prove the Central Limit Theorem. And, we won't actually get to proving it until late in Stat 415. It would be good though to get an intuitive feel now for how the CLT works in practice. On this page, we'll explore two examples to get a feel for how: [11]
- the skewness (or symmetry!) of the underlying distribution of
, and - the sample size
affect how well the normal distribution approximates the actual ("exact") distribution of the sample mean
Example 27-1
Consider taking random samples of various sizes
Solution
Our previous work on the continuous Uniform(0, 1) random variable tells us that the mean of a
while the variance of a
The Central Limit Theorem, therefore, tells us that the sample mean
and variance:
Now, our end goal is to compare the normal distribution, as defined by the CLT, to the actual distribution of the sample mean. Now, we could do a lot of theoretical work to find the exact distribution of
- Specify the sample size
. - Randomly generate 1000 samples of size
from the Uniform (0,1) distribution. - Use the 1000 generated samples to calculate 1000 sample means from the Uniform (0,1) distribution.
- Create a histogram of the 1000 sample means.
- Compare the histogram to the normal distribution, as defined by the Central Limit Theorem, in order to see how well the Central Limit Theorem works for the given sample size
.
Let's start with a sample size of
Okay, now let's tackle the more interesting sample sizes. Let
It can actually be shown that the exact distribution of the sample mean of 2 numbers drawn from the Uniform(0, 1) distribution is the triangular distribution. The histogram does look a bit triangular, doesn't it? The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
As you can see, already at
The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
Again, at
The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
And not surprisingly, at
Well, just for the heck of it, let's increase our sample size one more time to
The blue curve overlaid on the histogram is the normal distribution with mean:
and variance:
Again, at
- If the underlying distribution is symmetric, then you don't need a very large sample size for the normal distribution, as defined by the Central Limit Theorem, to do a decent job of approximating the probability distribution of the sample mean.
- The larger the sample size
, the smaller the variance of the sample mean.
Example 27-2
Now consider taking random samples of various sizes
Solution
We are going to do exactly what we did in the previous example. The only difference is that our underlying distribution here, that is, the chi-square(3) distribution, is highly-skewed. Now, our previous work on the chi-square distribution tells us that the mean of a chi-square random variable with three degrees of freedom is:
while the variance of a chi-square random variable with three degrees of freedom is:
The Central Limit Theorem, therefore, tells us that the sample mean
and variance:
Again, we'll follow a strategy similar to that in the above example, namely:
- Specify the sample size
. - Randomly generate 1000 samples of size
from the chi-square(3) distribution. - Use the 1000 generated samples to calculate 1000 sample means from the chi-square(3) distribution.
- Create a histogram of the 1000 sample means.
- Compare the histogram to the normal distribution, as defined by the Central Limit Theorem, in order to see how well the Central Limit Theorem works for the given sample size
.
Again, starting with a sample size of
Now, let's consider samples of size
The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
As you can see, at
The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
Although, at
The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
We're getting closer, but let's really jump up the sample size to, say,
The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
- Okay, now we're talking! There's still just a teeny tiny bit of skewness in the sampling distribution. Let's increase the sample size just one more time to, say,
. Generating 1000 samples of size , calculating the 1000 sample means, and creating a histogram of the 1000 sample means, we get:
The blue curve overlaid on the histogram is the normal distribution, as defined by the Central Limit Theorem. That is, the blue curve is the normal distribution with mean:
and variance:
Okay, now, I'm perfectly happy! It appears that, at
- Again, the larger the sample size
, the smaller the variance of the sample mean. Nothing new there. - If the underlying distribution is skewed, then you need a larger sample size, typically
, for the normal distribution, as defined by the Central Limit Theorem, to do a decent job of approximating the probability distribution of the sample mean.
27.3 - Applications in Practice
27.3 - Applications in PracticeNow that we have an intuitive feel for the Central Limit Theorem, let's use it in two different examples. In the first example, we use the Central Limit Theorem to describe how the sample mean behaves, and then use that behavior to calculate a probability. In the second example, we take a look at the most common use of the CLT, namely to use the theorem to test a claim.
Example 27-3

Take a random sample of size
for
Solution
The expected value of the random variable
The variance of the random variable
Therefore, the CLT tells us that the sample mean
and variance:
Therefore the standard deviation of
we see that:
Therefore, using the standard normal table, we get:
That is, there is an 81.85% chance that a random sample of size 15 from the given distribution will yield a sample mean between
Example 27-4

Let
Solution
It is reasonable to assume that
Therefore, knowing what we know about exponential random variables, the variance of
Now, we need to know, if the mean
and variance:
Here's a picture, then, of the normal probability that we need to determine:
That is:
The
That is, if the population mean
By the way, this is the kind of example that we'll see when we study hypothesis testing in Stat 415. In general, in the process of performing a hypothesis test, someone makes a claim (the assistant, in this case), and someone collects and uses the data (the manager, in this case) to make a decision about the validity of the claim. It just so happens to be that we used the CLT in this example to help us make a decision about the assistant's claim. [12]
Lesson 28: Approximations for Discrete Distributions
Lesson 28: Approximations for Discrete DistributionsOverview
In the previous lesson, we explored the Central Limit Theorem, which states that if
In that lesson, all of the examples concerned continuous random variables. In this lesson, our focus will be on applying the Central Limit Theorem to discrete random variables. In particular, we will investigate how to use the normal distribution to approximate binomial probabilities and Poisson probabilities.
Objectives
- To learn how to use the normal distribution to approximate binomial probabilities.
- To learn how to use the normal distribution to approximate Poisson probabilities.
- To be able to apply the methods learned in this lesson to new problems.
28.1 - Normal Approximation to Binomial
28.1 - Normal Approximation to BinomialAs the title of this page suggests, we will now focus on using the normal distribution to approximate binomial probabilities. The Central Limit Theorem is the tool that allows us to do so. As usual, we'll use an example to motivate the material.
Example 28-1

Let
- Let
, if the person approves of the job the President is doing, with probability - Let
, if the person does not approve of the job the President is doing with probability
Then, recall that
and variance:
Now, take a random sample of
Then
and variance:
Now, let
Solution
There is really nothing new here. We can calculate the exact probability using the binomial table in the back of the book with
That is, there is a 24.6% chance that exactly five of the ten people selected approve of the job the President is doing.
Note, however, that
Let's use the normal distribution then to approximate some probabilities for
Solution
First, recognize in our case that the mean is:
and the variance is:
Now, if we look at a graph of the binomial distribution with the rectangle corresponding to
we should see that we would benefit from making some kind of correction for the fact that we are using a continuous distribution to approximate a discrete distribution. Specifically, it seems that the rectangle
Such an adjustment is called a "continuity correction." Once we've made the continuity correction, the calculation reduces to a normal probability calculation:
Now, recall that we previous used the binomial distribution to determine that the probability that
Let's try a few more approximations. What is the probability that more than 7, but at most 9, of the ten people sampled approve of the job the President is doing?
Solution
If we look at a graph of the binomial distribution with the area corresponding to
we should see that we'll want to make the following continuity correction:
Now again, once we've made the continuity correction, the calculation reduces to a normal probability calculation:
By the way, you might find it interesting to note that the approximate normal probability is quite close to the exact binomial probability. We showed that the approximate probability is 0.0549, whereas the following calculation shows that the exact probability (using the binomial table with
Let's try one more approximation. What is the probability that at least 2, but less than 4, of the ten people sampled approve of the job the President is doing?
Solution
If we look at a graph of the binomial distribution with the area corresponding to
we should see that we'll want to make the following continuity correction:
Again, once we've made the continuity correction, the calculation reduces to a normal probability calculation:
By the way, the exact binomial probability is 0.1612, as the following calculation illustrates:
Just a couple of comments before we close our discussion of the normal approximation to the binomial.
(1) First, we have not yet discussed what "sufficiently large" means in terms of when it is appropriate to use the normal approximation to the binomial. The general rule of thumb is that the sample size
For example, in the above example, in which
Now, both conditions are true if:
Because our sample size was at least 10 (well, barely!), we now see why our approximations were quite close to the exact probabilities. In general, the farther
Now, the first condition is met if:
And, the second condition is met if:
That is, the only way both conditions are met is if
(2) In truth, if you have the available tools, such as a binomial table or a statistical package, you'll probably want to calculate exact probabilities instead of approximate probabilities. Does that mean all of our discussion here is for naught? No, not at all! In reality, we'll most often use the Central Limit Theorem as applied to the sum of independent Bernoulli random variables to help us draw conclusions about a true population proportion
The quantity:
that appears in the numerator is the "sample proportion," that is, the proportion in the sample meeting the condition of interest (approving of the President's job, for example). In Stat 415, we'll use the sample proportion in conjunction with the above result to draw conclusions about the unknown population proportion p. You'll definitely be seeing much more of this in Stat 415! [15]
28.2 - Normal Approximation to Poisson
28.2 - Normal Approximation to PoissonJust as the Central Limit Theorem can be applied to the sum of independent Bernoulli random variables, it can be applied to the sum of independent Poisson random variables. Suppose
is a Poisson random variable with mean
We'll use this result to approximate Poisson probabilities using the normal distribution.
Example 28-2

The annual number of earthquakes registering at least 2.5 on the Richter Scale and having an epicenter within 40 miles of downtown Memphis follows a Poisson distribution with mean 6.5. What is the probability that at least 9 such earthquakes will strike next year? (Adapted from An Introduction to Mathematical Statistics, by Richard J. Larsen and Morris L. Marx.)
Solution.
We can, of course use the Poisson distribution to calculate the exact probability. Using the Poisson table with
Now, let's use the normal approximation to the Poisson to calculate an approximate probability. First, we have to make a continuity correction. Doing so, we get:
Once we've made the continuity correction, the calculation again reduces to a normal probability calculation:
So, in summary, we used the Poisson distribution to determine the probability that