26.4 - Student's t Distribution

26.4 - Student's t Distribution

We have just one more topic to tackle in this lesson, namely, Student's t distribution. Let's just jump right in and define it!

Definition. If $$Z\sim N(0,1)$$ and $$U\sim \chi^2(r)$$ are independent, then the random variable:

$$T=\dfrac{Z}{\sqrt{U/r}}$$

follows a $$t$$-distribution with $$r$$ degrees of freedom. We write $$T\sim t(r)$$. The p.d.f. of T is:

$$f(t)=\dfrac{\Gamma((r+1)/2)}{\sqrt{\pi r} \Gamma(r/2)} \cdot \dfrac{1}{(1+t^2/r)^{(r+1)/2}}$$

for $$-\infty<t<\infty$$.

By the way, the $$t$$ distribution was first discovered by a man named W.S. Gosset. He discovered the distribution when working for an Irish brewery. Because he published under the pseudonym Student, the $$t$$ distribution is often called Student's $$t$$ distribution.

History aside, the above definition is probably not particularly enlightening. Let's try to get a feel for the $$t$$ distribution by way of simulation. Let's randomly generate 1000 standard normal values ($$Z$$) and 1000 chi-square(3) values ($$U$$). Then, the above definition tells us that, if we take those randomly generated values, calculate:

$$T=\dfrac{Z}{\sqrt{U/3}}$$

and create a histogram of the 1000 resulting $$T$$ values, we should get a histogram that looks like a $$t$$ distribution with 3 degrees of freedom. Well, here's a subset of the resulting values from one such simulation:

ROW Z CHISQ (3) T(3)
1 -2.60481 10.2497 -1.4092
2 2.92321 1.6517 3.9396
3 -0.48633 0.1757 -2.0099
4 -0.48212 3.8283 -0.4268
5 -0.04150 0.2422 -0.1461
6 -0.84225 -0.0903 -4.8544
7 -0.31205 1.6326 -0.4230
8 1.33068 5.2224 1.0086
9 -0.64104 0.9401 -1.1451
10 -0.05110 2.2632 -0.0588
11 1.61601 4.6566 1.2971
12 0.81522 2.1738 0.9577
13 0.38501 1.8404 0.4916
14 -1.63426 1.1265 -2.6669

...and so on...

994 -0.18942 3.5202 -0.1749
995 0.43078 3.3585 0.4071
996 -0.14068 0.6236 -0.3085
997 -1.76357 2.6188 -1.8876
998 -1.02310 3.2470 -0.9843
999 -0.93777 1.4991 -1.3266
1000 -0.37665 2.1231 -0.4477

Note, for example, in the first row:

$$T(3)=\dfrac{-2.60481}{\sqrt{10.2497/3}}=-1.4092$$

Here's what the resulting histogram of the 1000 randomly generated $$T(3)$$ values looks like, with a standard $$N(0,1)$$ curve superimposed:

Hmmm. The $$t$$-distribution seems to be quite similar to the standard normal distribution. Using the formula given above for the p.d.f. of $$T$$, we can plot the density curve of various $$t$$ random variables, say when $$r=1, r=4$$, and $$r=7$$, to see that that is indeed the case:

In fact, it looks as if, as the degrees of freedom $$r$$ increases, the $$t$$ density curve gets closer and closer to the standard normal curve. Let's summarize what we've learned in our little investigation about the characteristics of the t distribution:

1. The support appears to be $$-\infty<t<\infty$$. (It is!)
2. The probability distribution appears to be symmetric about $$t=0$$. (It is!)
3. The probability distribution appears to be bell-shaped. (It is!)
4. The density curve looks like a standard normal curve, but the tails of the $$t$$-distribution are "heavier" than the tails of the normal distribution. That is, we are more likely to get extreme $$t$$-values than extreme $$z$$-values.
5. As the degrees of freedom $$r$$ increases, the $$t$$-distribution appears to approach the standard normal $$z$$-distribution. (It does!)

As you'll soon see, we'll need to look up $$t$$-values, as well as probabilities concerning $$T$$ random variables, quite often in Stat 415. Therefore, we better make sure we know how to read a $$t$$ table.

The $$t$$ Table

If you take a look at Table VI in the back of your textbook, you'll find what looks like a typical $$t$$ table. Here's what the top of Table VI looks like (well, minus the shading that I've added):

$P(T \leq t)=\int_{-\infty}^{t} \frac{\Gamma[(r+1) / 2]}{\sqrt{\pi r} \Gamma(r / 2)\left(1+w^{2} / r\right)^{(r+1) / 2}} d w$

$P(T \leq-t)=1-P(T \leq t)$

 P(T≤ t) 0.60 0.75 0.90 0.95 0.975 0.99 0.995 r t0.40(r) t0.25(r) t0.10(r) t0.05(r) t0.025(r) t0.01(r) t0.005(r) 1 0.325 1.000 3.078 6.314 12.706 31.821 63.657 2 0.289 0.816 1.886 2.920 4.303 6.965 9.925 3 0.277 0.765 1.638 2.353 3.182 4.541 5.841 4 0.271 0.741 1.533 2.132 2.776 3.747 4.604 5 0.267 0.727 1.476 2.015 2.571 3.365 4.032 6 0.265 0.718 1.440 1.943 2.447 3.143 3.707 7 0.263 0.711 1.415 1.895 2.365 2.998 3.499 8 0.262 0.706 1.397 1.860 2.306 2.896 3.355 9 0.261 0.703 1.383 1.833 2.262 2.821 3.250 10 0.260 0.700 1.372 1.812 2.228 2.764 3.169

The $$t$$-table is similar to the chi-square table in that the inside of the $$t$$-table (shaded in purple) contains the $$t$$-values for various cumulative probabilities (shaded in red), such as 0.60, 0.75, 0.90, 0.95, 0.975, 0.99, and 0.995, and for various $$t$$ distributions with $$r$$ degrees of freedom (shaded in blue). The row shaded in green indicates the upper $$\alpha$$ probability that corresponds to the $$1-\alpha$$ cumulative probability. For example, if you're interested in either a cumulative probability of 0.60, or an upper probability of 0.40, you'll want to look for the $$t$$-value in the first column.

Let's use the $$t$$-table to read a few probabilities and $$t$$-values off of the table:

Let's take a look at a few more examples.

Example 26-6

Let $$T$$ follow a $$t$$-distribution with $$r=8$$ df. What is the probability that the absolute value of $$T$$ is less than 2.306?

Solution

The probability calculation is quite similar to a calculation we'd have to make for a normal random variable. First, rewriting the probability in terms of $$T$$ instead of the absolute value of $$T$$, we get:

$$P(|T|<2.306)=P(-2.306<T<2.306)$$

Then, we have to rewrite the probability in terms of cumulative probabilities that we can actually find, that is:

$$P(|T|<2.306)=P(T<2.306)-P(T<-2.306)$$

Pictorially, the probability we are looking for looks something like this:

But the $$t$$-table doesn't contain negative $$t$$-values, so we'll have to take advantage of the symmetry of the $$T$$ distribution. That is:

>$$P(|T|<2.306)=P(T<2.306)-P(T>2.306)$$

Can you find the necessary $$t$$-values on the $$t$$-table?
 P(T≤ t) 0.60 0.75 0.90 0.95 0.975 0.99 0.995 r t0.40(r) t0.25(r) t0.10(r) t0.05(r) t0.025(r) t0.01(r) t0.005(r) 1 0.325 1.000 3.078 6.314 12.706 31.821 63.657 2 0.289 0.816 1.886 2.920 4.303 6.965 9.925 3 0.277 0.765 1.638 2.353 3.182 4.541 5.841 4 0.271 0.741 1.533 2.132 2.776 3.747 4.604 5 0.267 0.727 1.476 2.015 2.571 3.365 4.032 6 0.265 0.718 1.440 1.943 2.447 3.143 3.707 7 0.263 0.711 1.415 1.895 2.365 2.998 3.499 8 0.262 0.706 1.397 1.860 2.306 2.896 3.355 9 0.261 0.703 1.383 1.833 2.262 2.821 3.250 10 0.260 0.700 1.372 1.812 2.228 2.764 3.169
 P(T≤ t) 0.60 0.75 0.90 0.95 0.975 0.99 0.995 r t0.40(r) t0.25(r) t0.10(r) t0.05(r) t0.025(r) t0.01(r) t0.005(r) 1 0.325 1.000 3.078 6.314 12.706 31.821 63.657 2 0.289 0.816 1.886 2.920 4.303 6.965 9.925 3 0.277 0.765 1.638 2.353 3.182 4.541 5.841 4 0.271 0.741 1.533 2.132 2.776 3.747 4.604 5 0.267 0.727 1.476 2.015 2.571 3.365 4.032 6 0.265 0.718 1.440 1.943 2.447 3.143 3.707 7 0.263 0.711 1.415 1.895 2.365 2.998 3.499 8 0.262 0.706 1.397 1.860 2.306 2.896 3.355 9 0.261 0.703 1.383 1.833 2.262 2.821 3.250 10 0.260 0.700 1.372 1.812 2.228 2.764 3.169

The $$t$$-table tells us that $$P(T<2.306)=0.975$$ and $$P(T>2.306)=0.025$$.  Therefore:

$$P(|T|>2.306)=0.975-0.025=0.95$$

What is $$t_{0.05}(8)$$?

Solution

The value $$t_{0.05}(8)$$ is the value $$t_{0.05}$$ such that the probability that a $$T$$ random variable with 8 degrees of freedom is greater than the value $$t_{0.05}$$ is 0.05. That is:

Can you find the value $$t_{0.05}$$ on the $$t$$-table?

 P(T≤ t) 0.60 0.75 0.90 0.95 0.975 0.99 0.995 r t0.40(r) t0.25(r) t0.10(r) t0.05(r) t0.025(r) t0.01(r) t0.005(r) 1 0.325 1.000 3.078 6.314 12.706 31.821 63.657 2 0.289 0.816 1.886 2.920 4.303 6.965 9.925 3 0.277 0.765 1.638 2.353 3.182 4.541 5.841 4 0.271 0.741 1.533 2.132 2.776 3.747 4.604 5 0.267 0.727 1.476 2.015 2.571 3.365 4.032 6 0.265 0.718 1.440 1.943 2.447 3.143 3.707 7 0.263 0.711 1.415 1.895 2.365 2.998 3.499 8 0.262 0.706 1.397 1.860 2.306 2.896 3.355 9 0.261 0.703 1.383 1.833 2.262 2.821 3.250 10 0.260 0.700 1.372 1.812 2.228 2.764 3.169
 P(T≤ t) 0.60 0.75 0.90 0.95 0.975 0.99 0.995 r t0.40(r) t0.25(r) t0.10(r) t0.05(r) t0.025(r) t0.01(r) t0.005(r) 1 0.325 1.000 3.078 6.314 12.706 31.821 63.657 2 0.289 0.816 1.886 2.920 4.303 6.965 9.925 3 0.277 0.765 1.638 2.353 3.182 4.541 5.841 4 0.271 0.741 1.533 2.132 2.776 3.747 4.604 5 0.267 0.727 1.476 2.015 2.571 3.365 4.032 6 0.265 0.718 1.440 1.943 2.447 3.143 3.707 7 0.263 0.711 1.415 1.895 2.365 2.998 3.499 8 0.262 0.706 1.397 1.860 2.306 2.896 3.355 9 0.261 0.703 1.383 1.833 2.262 2.821 3.250 10 0.260 0.700 1.372 1.812 2.228 2.764 3.169

We have determined that the probability that a $$T$$ random variable with 8 degrees of freedom is greater than the value 1.860 is 0.05.

Why will we encounter a $$T$$ random variable?

Given a random sample $$X_1, X_2, \ldots, X_n$$ from a normal distribution, we know that:

$$Z=\dfrac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1)$$

Earlier in this lesson, we learned that:

$$U=\dfrac{(n-1)S^2}{\sigma^2}$$

follows a chi-square distribution with $$n-1$$ degrees of freedom. We also learned that $$Z$$ and $$U$$ are independent. Therefore, using the definition of a $$T$$ random variable, we get:

It is the resulting quantity, that is:

$$T=\dfrac{\bar{X}-\mu}{s/\sqrt{n}}$$

that will help us, in Stat 415, to use a mean from a random sample, that is $$\bar{X}$$, to learn, with confidence, something about the population mean $$\mu$$.

  Link ↥ Has Tooltip/Popover Toggleable Visibility