26.2 - Sampling Distribution of Sample Mean

Okay, we finally tackle the probability distribution (also known as the "sampling distribution") of the sample mean when \(X_1, X_2, \ldots, X_n\) are a random sample from a normal population with mean \(\mu\) and variance \(\sigma^2\). The word "tackle" is probably not the right choice of word, because the result follows quite easily from the previous theorem, as stated in the following corollary.

Corollary

If \(X_1, X_2, \ldots, X_n\) are observations of a random sample of size \(n\) from a \(N(\mu, \sigma^2)\) population, then the sample mean:

\(\bar{X}=\dfrac{1}{n}\sum\limits_{i=1}^n X_i\)

is normally distributed with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\). That is, the probability distribution of the sample mean is:

\(N(\mu,\sigma^2/n)\)

Proof

The result follows directly from the previous theorem. All we need to do is recognize that the sample mean:

\(\bar{X}=\dfrac{X_1+X_2+\cdots+X_n}{n}\)

is a linear combination of independent normal random variables:

\(\bar{X}=\dfrac{1}{n} X_1+\dfrac{1}{n} X_2+\cdots+\dfrac{1}{n} X_n\)

with \(c_i=\frac{1}{n}\), the mean \(\mu_i=\mu\) and the variance \(\sigma^2_i=\sigma^2\). That is, the moment generating function of the sample mean is then:

\(M_{\bar{X}}(t)=\text{exp}\left[t\left(\sum\limits_{i=1}^n c_i \mu_i\right)+\dfrac{t^2}{2}\left(\sum\limits_{i=1}^n c^2_i \sigma^2_i\right)\right]=\text{exp}\left[t\left(\sum\limits_{i=1}^n \dfrac{1}{n}\mu\right)+\dfrac{t^2}{2}\left(\sum\limits_{i=1}^n \left(\dfrac{1}{n}\right)^2\sigma^2\right)\right]\)

The first equality comes from the theorem on the previous page, about the distribution of a linear combination of independent normal random variables. The second equality comes from simply replacing \(c_i\) with \(\frac{1}{n}\), the mean \(\mu_i\) with \(\mu\) and the variance \(\sigma^2_i\) with \(\sigma^2\). Now, working on the summations, the moment generating function of the sample mean reduces to:

\(M_{\bar{X}}(t)=\text{exp}\left[t\left(\dfrac{1}{n} \sum\limits_{i=1}^n \mu\right)+\dfrac{t^2}{2}\left(\dfrac{1}{n^2}\sum\limits_{i=1}^n \sigma^2\right)\right]=\text{exp}\left[t\left(\dfrac{1}{n}(n\mu)\right)+\dfrac{t^2}{2}\left(\dfrac{1}{n^2}(n\sigma^2)\right)\right]=\text{exp}\left[\mu t +\dfrac{t^2}{2} \left(\dfrac{\sigma^2}{n}\right)\right]\)

The first equality comes from pulling the constants depending on \(n\) through the summation signs. The second equality comes from adding \(\mu\) up \(n\) times to get \(n\mu\), and adding \(\sigma^2\) up \(n\) times to get \(n\sigma^2\). The last equality comes from simplifying a bit more. In summary, we have shown that the moment generating function of the sample mean of \(n\) independent normal random variables with mean \(\mu\) and variance \(\sigma^2\) is:

\(M_{\bar{X}}(t)=\text{exp}\left[\mu t +\dfrac{t^2}{2} \left(\dfrac{\sigma^2}{n}\right)\right]\)

That is the same as the moment generating function of a normal random variable with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\). Therefore, the uniqueness property of moment-generating functions tells us that the sample mean must be normally distributed with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\). Our proof is complete.

Example 26-4

Let \(X_i\) denote the Stanford-Binet Intelligence Quotient (IQ) of a randomly selected individual, \(i=1, \ldots, 4\) (one sample). Let \(Y_i\) denote the IQ of a randomly selected individual, \(i=1, \ldots, 8\) (a second sample). Recalling that IQs are normally distributed with mean \(\mu=100\) and variance \(\sigma^2=16^2\), what is the distribution of \(\bar{X}\)? And, what is the distribution of \(\bar{Y}\)?

Anwser

In general, the variance of the sample mean is:

\(Var(\bar{X})=\dfrac{\sigma^2}{n}\)

Therefore, the variance of the sample mean of the first sample is:

\(Var(\bar{X}_4)=\dfrac{16^2}{4}=64\)

(The subscript 4 is there just to remind us that the sample mean is based on a sample of size 4.) And, the variance of the sample mean of the second sample is:

\(Var(\bar{Y}_8=\dfrac{16^2}{8}=32\)

(The subscript 8 is there just to remind us that the sample mean is based on a sample of size 8.) Now, the corollary therefore tells us that the sample mean of the first sample is normally distributed with mean 100 and variance 64. That is:

\(\bar{X}_4 \sim N(100,64)\)

And, the sample mean of the second sample is normally distributed with mean 100 and variance 32. That is:

\(\bar{Y}_8 \sim N(100,32)\)

So, we have two, no actually, three normal random variables with the same mean, but difference variances:

We have \(X_i\), an IQ of a random individual. It is normally distributed with mean 100 and variance 256.
We have \(\bar{X}_4\), the average IQ of 4 random individuals. It is normally distributed with mean 100 and variance 64.
We have \(\bar{Y}_8\), the average IQ of 8 random individuals. It is normally distributed with mean 100 and variance 32.

It is quite informative to graph these three distributions on the same plot. Doing so, we get:

As the plot suggests, an individual \(X_i\), the mean (\bar{X}_4\) and the mean \(\bar{Y}_8\) all provide valid, "unbiased" estimates of the population mean \(\mu\). But, our intuition coincides with reality... that is, the sample mean \(\bar{Y}_8\) will be the most precise estimate of \(\mu\).

All the work that we have done so far concerning this example has been theoretical in nature. That is, what we have learned is based on probability theory. Would we see the same kind of result if we were take to a large number of samples, say 1000, of size 4 and 8, and calculate the sample mean of each sample? That is, would the distribution of the 1000 sample means based on a sample of size 4 look like a normal distribution with mean 100 and variance 64? And would the distribution of the 1000 sample means based on a sample of size 8 look like a normal distribution with mean 100 and variance 32? Well, the only way to answer these questions is to try it out!

I did just that for us. I used Minitab to generate 1000 samples of eight random numbers from a normal distribution with mean 100 and variance 256. Here's a subset of the resulting random numbers:

ROW	X1	X2	X3	X4	X5	X6	X7	X8	Mean 4	Mean 8
1	87	68	98	114	59	111	114	86	91.75	92.125
2	102	81	74	110	112	106	105	99	91.75	98.625
3	96	87	50	88	69	107	94	83	80.25	84.250
4	83	134	122	80	117	110	115	158	104.75	114.875
5	92	87	120	93	90	111	95	92	98.00	97.500
6	139	102	100	103	111	62	78	73	111.00	96.000
7	134	121	99	118	108	106	103	91	118.00	110.000
8	126	92	148	131	99	106	143	128	124.25	121.625
9	98	109	119	110	124	99	119	82	109.00	107.500
10	85	93	82	106	93	109	100	95	91.50	95.375
11	121	103	108	96	112	117	93	112	107.00	107.750
12	118	91	106	108	128	96	65	85	105.75	99.625
13	92	87	96	81	86	105	91	104	89.00	92.750
14	94	115	59	105	101	122	97	103	93.25	99.500

...and so on...

975	108	139	130	97	138	88	104	87	118.50	111.375
976	99	122	93	107	98	62	102	115	105.25	99.750
977	99	127	91	101	127	79	81	121	104.50	103.250
978	120	108	101	104	90	90	191	104	108.25	101.000
979	101	93	106	113	115	82	96	97	103.25	100.375
980	118	86	74	95	109	111	90	83	93.25	95.750
981	118	95	121	124	111	90	105	112	114.50	109.500
982	110	121	85	117	91	84	84	108	108.25	100.000
983	95	109	118	112	121	105	84	115	108.50	107.375
984	102	105	127	104	95	101	106	103	109.50	105.375
985	116	93	112	102	67	92	103	114	105.75	99.875
986	106	97	114	82	82	108	113	81	99.75	97.875
987	107	93	78	91	83	81	115	102	92.25	93.750
988	106	115	105	74	86	124	97	116	100.00	102.875
989	117	84	131	102	92	118	90	90	108.50	103.000
990	100	69	108	128	111	110	94	95	101.25	101.875
991	86	85	123	94	104	89	76	97	97.00	94.250
992	94	90	72	121	105	150	72	88	94.25	99.000
993	70	109	104	114	93	103	126	99	99.25	102.250
994	102	110	98	93	64	131	91	95	100.75	98.000
995	80	135	120	92	118	119	66	117	106.75	105.875
996	81	102	88	98	113	81	95	110	92.25	96.000
997	85	146	73	133	111	88	92	74	109.25	100.250
998	94	109	110	115	95	93	90	103	107.00	101.125
999	84	84	97	125	92	89	95	124	97.50	98.750
1000	77	60	113	106	107	109	110	103	89.00	98.125

As you can see, the second last column, titled Mean4, is the average of the first four columns X1 X2, X3, and X4. The last column, titled Mean8, is the average of the first eight columns X1, X2, X3, X4, X5, X6, X7, and X8. Now, all we have to do is create a histogram of the sample means appearing in the Mean4 column:

Ahhhh! The histogram sure looks fairly bell-shaped, making the normal distribution a real possibility. Now, recall that the Empirical Rule tells us that we should expect, if the sample means are normally distributed, that almost all of the sample means would fall within three standard deviations of the population mean. That is, in the case of Mean4, we should expect almost all of the data to fall between 76 (from 100−3(8)) and 124 (from 100+3(8)). It sure looks like that's the case!

Let's do the same thing for the Mean8 column. That is, let's create a histogram of the sample means appearing in the Mean8 column. Doing so, we get:

Again, the histogram sure looks fairly bell-shaped, making the normal distribution a real possibility. In this case, the Empirical Rule tells us that, in the case of Mean8, we should expect almost all of the data to fall between 83 (from 100−3(square root of 32)) and 117 (from 100+3(square root of 32)). It too looks pretty good on both sides, although it seems that there were two really extreme sample means of size 8. (If you look back at the data, you can see one of them in the eighth row.)

In summary, the whole point of this exercise was to use the theory to help us derive the distribution of the sample mean of IQs, and then to use real simulated normal data to see if our theory worked in practice. I think we can conclude that it does!

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility