In the next lesson, we'll learn how to compare the means of two *independent* populations, but there may be occasions in which we are interested in comparing the means of two *dependent* populations. For example, suppose a researcher is interested in determining whether the mean IQ of the population of first-born twins differs from the mean IQ of the population of second-born twins. She identifies a random sample of \(n\) pairs of twins, and measures \(X\), the IQ of the first-born twin, and \(Y\), the IQ of the second-born twin. In that case, she's interested in determining whether:

\(\mu_X=\mu_Y\)

or equivalently if:

\(\mu_X-\mu_Y=0\)

Now, the population of first-born twins is not independent of the population of second-born twins. Since all of our distributional theory requires the independence of measurements, we're rather stuck. There's a way out though... we can "remove" the dependence between \(X\) and \(Y\) by subtracting the two measurements \(X_i\) and \(Y_i\) for each pair of twins \(i\), that is, by considering the independent measurements

\(D_i=X_i-Y_i\)

Then, our null hypothesis involves just a single mean, which we'll denote \(\mu_D\), the mean of the differences:

\(H_0=\mu_D=\mu_X-\mu_Y=0\)

and then our hard work is done! We can just use the \(t\)-test for a mean for conducting the hypothesis test... it's just that, in this situation, our measurements are differences \(d_i\) whose mean is \(\bar{d}\) and standard deviation is \(s_D\). That is, when testing the null hypothesis \(H_0:\mu_D=\mu_0\) against any of the alternative hypotheses \(H_A:\mu_D \neq \mu_0\), \(H_A:\mu_D<\mu_0\), and \(H_A:\mu_D>\mu_0\), we compare the test statistic:

\(t=\dfrac{\bar{d}-\mu_0}{s_D/\sqrt{n}}\)

to a \(t\)-distribution with \(n-1\) degrees of freedom. Let's take a look at an example!

##
Example 10-3
Section* *

Blood samples from \(n=10\) = 10 people were sent to each of two laboratories (Lab 1 and Lab 2) for cholesterol determinations. The resulting data are summarized here:

Subject | Lab 1 | Lab 2 | Diff |
---|---|---|---|

1 |
296 | 318 | -22 |

2 | 268 | 287 | -19 |

. | . | . | . |

. | . | . | . |

. | . | . | . |

10 | 262 | 285 | -23 |

\(\bar{x}_{1}=260.6\) | \(\bar{x}_{2}=275\) | \(\begin{array}{c} \bar{d}=-14.4 \\ s_{d}=6.77 \end{array}\) |

Is there a statistically significant difference at the \(\alpha=0.01\) level, say, in the (population) mean cholesterol levels reported by Lab 1 and Lab 2?

### Answer

The null hypothesis is \(H_0:\mu_D=0\), and the alternative hypothesis is \(H_A:\mu_D\ne 0\). The value of the test statistic is:

\(t=\dfrac{-14.4-0}{6.77/\sqrt{10}}=-6.73\)

The critical region approach tells us to reject the null hypothesis at the \(\alpha=0.01\) level if \(t>t_{0.005, 9}=3.25\) or if \(t<t_{0.005, 9}=-3.25\). Therefore, we reject the null hypothesis because \(t=-6.73<-3.25\), and therefore falls in the rejection region.

Again, we draw the same conclusion when using the \(p\)-value approach. In this case, the \(p\)-value is:

\(p-\text{value }=2\times P(T_9<-6.73)\le 2\times 0.005=0.01\)

As expected, we reject the null hypothesis because \(p\)-value \(\le 0.01=\alpha\).

And, the Minitab output for this example looks like this:

N | Mean | StDev | SE Mean | 95% CI | T | P |
---|---|---|---|---|---|---|

10 | -14.4000 | 6.7700 | 2.1409 | (-19.2430, -9.5570) | -6.73 | 0.000 |