26.2 - Sampling Distribution of Sample Mean

Okay, we finally tackle the probability distribution (also known as the "sampling distribution") of the sample mean when \(X_1, X_2, \ldots, X_n\) are a random sample from a normal population with mean \(\mu\) and variance \(\sigma^2\). The word "tackle" is probably not the right choice of word, because the result follows quite easily from the previous theorem, as stated in the following corollary.

Corollary

If \(X_1, X_2, \ldots, X_n\) are observations of a random sample of size \(n\) from a \(N(\mu, \sigma^2)\) population, then the sample mean:

\(\bar{X}=\dfrac{1}{n}\sum\limits_{i=1}^n X_i\)

is normally distributed with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\). That is, the probability distribution of the sample mean is:

\(N(\mu,\sigma^2/n)\)

Proof

The result follows directly from the previous theorem. All we need to do is recognize that the sample mean:

\(\bar{X}=\dfrac{X_1+X_2+\cdots+X_n}{n}\)

is a linear combination of independent normal random variables:

\(\bar{X}=\dfrac{1}{n} X_1+\dfrac{1}{n} X_2+\cdots+\dfrac{1}{n} X_n\)

with \(c_i=\frac{1}{n}\), the mean \(\mu_i=\mu\) and the variance \(\sigma^2_i=\sigma^2\). That is, the moment generating function of the sample mean is then:

\(M_{\bar{X}}(t)=\text{exp}\left[t\left(\sum\limits_{i=1}^n c_i \mu_i\right)+\dfrac{t^2}{2}\left(\sum\limits_{i=1}^n c^2_i \sigma^2_i\right)\right]=\text{exp}\left[t\left(\sum\limits_{i=1}^n \dfrac{1}{n}\mu\right)+\dfrac{t^2}{2}\left(\sum\limits_{i=1}^n \left(\dfrac{1}{n}\right)^2\sigma^2\right)\right]\)

The first equality comes from the theorem on the previous page, about the distribution of a linear combination of independent normal random variables. The second equality comes from simply replacing \(c_i\) with \(\frac{1}{n}\), the mean \(\mu_i\) with \(\mu\) and the variance \(\sigma^2_i\) with \(\sigma^2\). Now, working on the summations, the moment generating function of the sample mean reduces to:

\(M_{\bar{X}}(t)=\text{exp}\left[t\left(\dfrac{1}{n} \sum\limits_{i=1}^n \mu\right)+\dfrac{t^2}{2}\left(\dfrac{1}{n^2}\sum\limits_{i=1}^n \sigma^2\right)\right]=\text{exp}\left[t\left(\dfrac{1}{n}(n\mu)\right)+\dfrac{t^2}{2}\left(\dfrac{1}{n^2}(n\sigma^2)\right)\right]=\text{exp}\left[\mu t +\dfrac{t^2}{2} \left(\dfrac{\sigma^2}{n}\right)\right]\)

The first equality comes from pulling the constants depending on \(n\) through the summation signs. The second equality comes from adding \(\mu\) up \(n\) times to get \(n\mu\), and adding \(\sigma^2\) up \(n\) times to get \(n\sigma^2\). The last equality comes from simplifying a bit more. In summary, we have shown that the moment generating function of the sample mean of \(n\) independent normal random variables with mean \(\mu\) and variance \(\sigma^2\) is:

\(M_{\bar{X}}(t)=\text{exp}\left[\mu t +\dfrac{t^2}{2} \left(\dfrac{\sigma^2}{n}\right)\right]\)

That is the same as the moment generating function of a normal random variable with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\). Therefore, the uniqueness property of moment-generating functions tells us that the sample mean must be normally distributed with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\). Our proof is complete.

Example 26-4 Section

Let \(X_i\) denote the Stanford-Binet Intelligence Quotient (IQ) of a randomly selected individual, \(i=1, \ldots, 4\) (one sample). Let \(Y_i\) denote the IQ of a randomly selected individual, \(i=1, \ldots, 8\) (a second sample). Recalling that IQs are normally distributed with mean \(\mu=100\) and variance \(\sigma^2=16^2\), what is the distribution of \(\bar{X}\)? And, what is the distribution of \(\bar{Y}\)?

Anwser

In general, the variance of the sample mean is:

\(Var(\bar{X})=\dfrac{\sigma^2}{n}\)

Therefore, the variance of the sample mean of the first sample is:

\(Var(\bar{X}_4)=\dfrac{16^2}{4}=64\)

(The subscript 4 is there just to remind us that the sample mean is based on a sample of size 4.) And, the variance of the sample mean of the second sample is:

\(Var(\bar{Y}_8=\dfrac{16^2}{8}=32\)

(The subscript 8 is there just to remind us that the sample mean is based on a sample of size 8.) Now, the corollary therefore tells us that the sample mean of the first sample is normally distributed with mean 100 and variance 64. That is:

\(\bar{X}_4 \sim N(100,64)\)

And, the sample mean of the second sample is normally distributed with mean 100 and variance 32. That is:

\(\bar{Y}_8 \sim N(100,32)\)

So, we have two, no actually, three normal random variables with the same mean, but difference variances:

  • We have \(X_i\), an IQ of a random individual. It is normally distributed with mean 100 and variance 256.
  • We have \(\bar{X}_4\), the average IQ of 4 random individuals. It is normally distributed with mean 100 and variance 64.
  • We have \(\bar{Y}_8\), the average IQ of 8 random individuals. It is normally distributed with mean 100 and variance 32.

It is quite informative to graph these three distributions on the same plot. Doing so, we get:

n=8 n=4 n=1 0 1 2 3 4 0.0 0.2 0.4 0.0 0.2 0.4 0.2 0.4 Normal density IQ

As the plot suggests, an individual \(X_i\), the mean (\bar{X}_4\) and the mean \(\bar{Y}_8\) all provide valid, "unbiased" estimates of the population mean \(\mu\). But, our intuition coincides with reality... that is, the sample mean \(\bar{Y}_8\) will be the most precise estimate of \(\mu\).

All the work that we have done so far concerning this example has been theoretical in nature. That is, what we have learned is based on probability theory. Would we see the same kind of result if we were take to a large number of samples, say 1000, of size 4 and 8, and calculate the sample mean of each sample? That is, would the distribution of the 1000 sample means based on a sample of size 4 look like a normal distribution with mean 100 and variance 64? And would the distribution of the 1000 sample means based on a sample of size 8 look like a normal distribution with mean 100 and variance 32? Well, the only way to answer these questions is to try it out!

I did just that for us. I used Minitab to generate 1000 samples of eight random numbers from a normal distribution with mean 100 and variance 256. Here's a subset of the resulting random numbers:

 

ROW X1 X2 X3 X4 X5 X6 X7 X8 Mean 4 Mean 8
1 87 68 98 114 59 111 114 86 91.75 92.125
2 102 81 74 110 112 106 105 99 91.75 98.625
3 96 87 50 88 69 107 94 83 80.25 84.250
4 83 134 122 80 117 110 115 158 104.75 114.875
5 92 87 120 93 90 111 95 92 98.00 97.500
6 139 102 100 103 111 62 78 73 111.00 96.000
7 134 121 99 118 108 106 103 91 118.00 110.000
8 126 92 148 131 99 106 143 128 124.25 121.625
9 98 109 119 110 124 99 119 82 109.00 107.500
10 85 93 82 106 93 109 100 95 91.50 95.375
11 121 103 108 96 112 117 93 112 107.00 107.750
12 118 91 106 108 128 96 65 85 105.75 99.625
13 92 87 96 81 86 105 91 104 89.00 92.750
14 94 115 59 105 101 122 97 103 93.25 99.500
 
 ...and so on... 
 
975 108 139 130 97 138 88 104 87 118.50 111.375
976 99 122 93 107 98 62 102 115 105.25 99.750
977 99 127 91 101 127 79 81 121 104.50 103.250
978 120 108 101 104 90 90 191 104 108.25 101.000
979 101 93 106 113 115 82 96 97 103.25 100.375
980 118 86 74 95 109 111 90 83 93.25 95.750
981 118 95 121 124 111 90 105 112 114.50 109.500
982 110 121 85 117 91 84 84 108 108.25 100.000
983 95 109 118 112 121 105 84 115 108.50 107.375
984 102 105 127 104 95 101 106 103 109.50 105.375
985 116 93 112 102 67 92 103 114 105.75 99.875
986 106 97 114 82 82 108 113 81 99.75 97.875
987 107 93 78 91 83 81 115 102 92.25 93.750
988 106 115 105 74 86 124 97 116 100.00 102.875
989 117 84 131 102 92 118 90 90 108.50 103.000
990 100 69 108 128 111 110 94 95 101.25 101.875
991 86 85 123 94 104 89 76 97 97.00 94.250
992 94 90 72 121 105 150 72 88 94.25 99.000
993 70 109 104 114 93 103 126 99 99.25 102.250
994 102 110 98 93 64 131 91 95 100.75 98.000
995 80 135 120 92 118 119 66 117 106.75 105.875
996 81 102 88 98 113 81 95 110 92.25 96.000
997 85 146 73 133 111 88 92 74 109.25 100.250
998 94 109 110 115 95 93 90 103 107.00 101.125
999 84 84 97 125 92 89 95 124 97.50 98.750
1000 77 60 113 106 107 109 110 103 89.00 98.125

As you can see, the second last column, titled Mean4, is the average of the first four columns X1 X2, X3, and X4. The last column, titled Mean8, is the average of the first eight columns X1, X2, X3, X4, X5, X6, X7, and X8. Now, all we have to do is create a histogram of the sample means appearing in the Mean4 column:

708090100110120130050100FrequencyMean of X-bar (with n=4)

Ahhhh! The histogram sure looks fairly bell-shaped, making the normal distribution a real possibility. Now, recall that the Empirical Rule tells us that we should expect, if the sample means are normally distributed, that almost all of the sample means would fall within three standard deviations of the population mean. That is, in the case of Mean4, we should expect almost all of the data to fall between 76 (from 100−3(8)) and 124 (from 100+3(8)). It sure looks like that's the case!

Let's do the same thing for the Mean8 column. That is, let's create a histogram of the sample means appearing in the Mean8 column. Doing so, we get:

708090100110120130050100FrequencyMean of X-bar (with n=89)

Again, the histogram sure looks fairly bell-shaped, making the normal distribution a real possibility. In this case, the Empirical Rule tells us that, in the case of Mean8, we should expect almost all of the data to fall between 83 (from 100−3(square root of 32)) and 117 (from 100+3(square root of 32)). It too looks pretty good on both sides, although it seems that there were two really extreme sample means of size 8. (If you look back at the data, you can see one of them in the eighth row.)

In summary, the whole point of this exercise was to use the theory to help us derive the distribution of the sample mean of IQs, and then to use real simulated normal data to see if our theory worked in practice. I think we can conclude that it does!