# 7.5 - Matched Pairs for Means

#### Paired Data

Simply put, paired data involves taking two measurements on the same subjects, called repeated sampling. Think of studying the effectiveness of a diet plan. You would weigh yourself prior to starting the diet and again following some time on the diet. Depending on how much weight you lost you would determine if the diet was effective. Now do this for several people, not just yourself. What you might be interested in is estimating the true difference of the original weight and the weight lost after a certain period. If the plan were effective you would expect that the estimated confidence interval of these differences would be greater than zero.

**NOTE: one exception to this repeated sampling on the same subjects is if a pair of subjects are very closely related. For instance, studies involving spouses and twins are often treated as paired data. **

The test statistic for examining hypotheses about **one population mean difference (i.e. paired data)**:

\(t=\frac{\bar{x}_d-\mu_0}{\frac{s_d}{\sqrt{n}}}\)

where \(\bar{x}_d\) the observed sample mean difference, μ_{0} = value specified in null hypothesis, s_{d} = standard deviation of the differences in the sample measurements and *n* = sample size. For instance, if we wanted to test for a difference in mean SAT Math and mean SAT Verbal scores, we would random sample *n* subjects, record their SATM and SATV scores in two separate columns, then create a third column that contained the differences between these scores. Then the sample mean and sample standard deviation would be those that were calculated on this column of differences.

Notice that the top part of the statistic is the difference between the sample mean and the null hypothesis. The bottom part of the calculation is the standard error of the mean.

It is a convention that a test using a *t*-statistic is called a *t*-test. That is, hypothesis tests using the above would be referred to as a "Paired *t* test".

#### Example 4 – Paired Data

The average loss weekly loss of study hours due to consuming too much alcohol on the weekend is studied on 10 students before and after a certain alcohol awareness program is put into operation. Do the data provide evidence that the program was effective?

H_{0}: μ_{d}= 0 versusH_{a }: μ_{d}> 0

The test statistic is

\(t=\frac{\bar{x}_d-\mu_0}{\frac{s_d}{\sqrt{n}}}=\frac{0.52-0}{\frac{0.408}{\sqrt{10}}}=4.03\)

The *p*-value is the probability that a *t*-value would be greater than (to the right of ) 4.03. From Minitab we get 0.001. If using T-Table we would look at DF = 9 and since *t* = 4.03 > 3.00 our p-value from the table would *p* < 0.007

**Interpretation**:

Since *p* < 0.05 would reject the null hypothesis and conclude that the mean difference in the population is greater than 0, meaning that we would claim that the alcohol awareness program is effective.