7.3.1.1  Pooled Variances
7.3.1.1  Pooled VariancesConfidence Intervals for \(\boldsymbol{\mu_1\mu_2}\): Pooled Variances
When we have good reason to believe that the variance for population 1 is equal to that of population 2, we can estimate the common variance by pooling information from samples from population 1 and population 2.
An informal check for this is to compare the ratio of the two sample standard deviations. If the two are equal, the ratio would be 1, i.e. \(\frac{s_1}{s_2}=1\). However, since these are samples and therefore involve error, we cannot expect the ratio to be exactly 1. When the sample sizes are nearly equal (admittedly "nearly equal" is somewhat ambiguous, so often if sample sizes are small one requires they be equal), then a good Rule of Thumb to use is to see if the ratio falls from 0.5 to 2. That is, neither sample standard deviation is more than twice the other.
If this rule of thumb is satisfied, we can assume the variances are equal. Later in this lesson, we will examine a more formal test for equality of variances.
 Let \(n_1\) be the sample size from population 1 and let \(s_1\) be the sample standard deviation of population 1.
 Let \(n_2\) be the sample size from population 2 and \(s_2\) be the sample standard deviation of population 2.
Then the common standard deviation can be estimated by the pooled standard deviation:
\(s_p=\sqrt{\dfrac{(n_11)s_1^2+(n_21)s^2_2}{n_1+n_22}}\)
If we can assume the populations are independent, that each population is normal or has a large sample size, and that the population variances are the same, then it can be shown that...
\(t=\dfrac{\bar{x}_1\bar{x_2}0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\)
follows a tdistribution with \(n_1+n_22\) degrees of freedom.
Now, we can construct a confidence interval for the difference of two means, \(\mu_1\mu_2\).
 \(\boldsymbol{(1\alpha)100\%}\) Confidence interval for \(\boldsymbol{\mu_1\mu_2}\) for Pooled Variances
 \(\bar{x}_1\bar{x}_2\pm t_{\alpha/2}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\)

where \(t_{\alpha/2}\) comes from a tdistribution with \(n_1+n_22\) degrees of freedom.
Hypothesis Tests for \(\boldsymbol{\mu_1\mu_2}\): The Pooled ttest
Now let's consider the hypothesis test for the mean differences with pooled variances.
\(H_0\colon\mu_1\mu_2=0\)
\(H_a\colon \mu_1\mu_2\ne0\)
\(H_a\colon \mu_1\mu_2>0\)
\(H_a\colon \mu_1\mu_2<0\)
The assumptions/conditions are:
 The populations are independent
 The population variances are equal
 Each population is either normal or the sample size is large.
The test statistic is...
\(t^*=\dfrac{\bar{x}_1\bar{x}_20}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\)
And \(t^*\) follows a tdistribution with degrees of freedom equal to \(df=n_1+n_22\).
The pvalue, critical value, rejection region, and conclusion are found similarly to what we have done before.
Example 74: Comparing Packing Machines
In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, (machine.txt), in seconds, are shown in the tables.
42.1  41.3  42.4  43.2  41.8 
41.0  41.8  42.8  42.3  42.7 
\(\bar{x}_1=42.14, \text{s}_1= 0.683\)
42.7  43.8  42.5  43.1  44.0 
43.6  43.3  43.5  41.7  44.1 
\(\bar{x}_2=43.23, \text{s}_2= 0.750\)
Do the data provide sufficient evidence to conclude that, on the average, the new machine packs faster?
Are these independent samples? Yes, since the samples from the two machines are not related.
Are these large samples or a normal population?
We have \(n_1\lt 30\) and \(n_2\lt 30\). We do not have large enough samples, and thus we need to check the normality assumption from both populations. Let's take a look at the normality plots for this data:
From the normal probability plots, we conclude that both populations may come from normal distributions. Remember the plots do not indicate that they DO come from a normal distribution. It only shows if there are clear violations. We should proceed with caution.
Do the populations have equal variance? No information allows us to assume they are equal. We can use our rule of thumb to see if they are “close.” They are not that different as \(\dfrac{s_1}{s_2}=\dfrac{0.683}{0.750}=0.91\) is quite close to 1. This assumption does not seem to be violated.
We can thus proceed with the pooled ttest.
Let \(\mu_1\) denote the mean for the new machine and \(\mu_2\) denote the mean for the old machine.
The null hypothesis is that there is no difference in the two population means, i.e.
\(H_0\colon \mu_1\mu_2=0\)
The alternative is that the new machine is faster, i.e.
\(H_a\colon \mu_1\mu_2<0\)
The significance level is 5%. Since we may assume the population variances are equal, we first have to calculate the pooled standard deviation:
\begin{align} s_p&=\sqrt{\frac{(n_11)s^2_1+(n_21)s^2_2}{n_1+n_22}}\\ &=\sqrt{\frac{(101)(0.683)^2+(101)(0.750)^2}{10+102}}\\ &=\sqrt{\dfrac{9.261}{18}}\\ &=0.7173 \end{align}
The test statistic is:
\begin{align} t^*&=\dfrac{\bar{x}_1\bar{x}_20}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\\ &=\dfrac{42.1443.23}{0.7173\sqrt{\frac{1}{10}+\frac{1}{10}}}\\&=3.398 \end{align}
The alternative is lefttailed so the critical value is the value \(a\) such that \(P(T<a)=0.05\), with \(10+102=18\) degrees of freedom. The critical value is 1.7341. The rejection region is \(t^*<1.7341\).
Our test statistic, 3.3978, is in our rejection region, therefore, we reject the null hypothesis. With a significance level of 5%, we reject the null hypothesis and conclude there is enough evidence to suggest that the new machine is faster than the old machine.
To find the interval, we need all of the pieces. We calculated all but one when we conducted the hypothesis test. We only need the multiplier. For a 99% confidence interval, the multiplier is \(t_{0.01/2}\) with degrees of freedom equal to 18. This value is 2.878.
The interval is:
\(\bar{x}_1\bar{x}_2\pm t_{\alpha/2}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\)
\((42.1443.23)\pm 2.878(0.7173)\sqrt{\frac{1}{10}+\frac{1}{10}}\)
\(1.09\pm 0.9232\)
The 99% confidence interval is (2.013, 0.167).
We are 99% confident that the difference between the two population mean times is between 2.012 and 0.167.
Minitab: 2Sample ttest  Pooled
The following steps are used to conduct a 2sample ttest for pooled variances in Minitab.
 Choose Stat > Basic Statistics > 2Sample t .
 The following dialog boxes will then be displayed.
Note! When entering values into the samples in different columns input boxes, Minitab always subtracts the second value (column entered second) from the first value (column entered first).
 Select the Options button and enter the desired 'confidence level', 'null hypothesis value' (again for our class this will be 0), and select the correct 'alternative hypothesis' from the dropdown menu. Finally, check the box for 'assume equal variances'. This latter selection should only be done when we have verified the two variances can be assumed equal.
The Minitab output for the packing time example:
TwoSample TTest and CI: New Machine, Old Machine
Method
μ_{1}: mean of New Machine
μ_{2}: mean of Old Machine
Difference: μ_{1}  μ_{2}
Equal variances are assumed for this analysis.
Descriptive Statistics
Sample 
N 
Mean 
StDev 
SE Mean 
New Machine 
10 
42.140 
0.683  0.22 
Old Machine 
10 
43.230 
0.750  0.24 
Estimation for Difference
Difference  Pooled StDev  95% Upper Bound for Difference 
1.090 
0.717  0.534 
Test
Alternative hypothesis
H_{1}: μ_{1}  μ_{2} < 0
TValue  DF  PValue 

3.40  18  0.002 