5.6.1 - Inference for Independent Means

5.6.1 - Inference for Independent Means

As with comparing two population proportions, when we compare two population means from independent populations, the interest is in the difference between the two means. In other words, if \(\mu_1\) is the population mean from population 1 and \(\mu_2\) is the population mean from population 2, then the difference is \(\mu_1-\mu_2\). If \(\mu_1-\mu_2=0\) then there is no difference between the two population parameters.

If each population is normal, then the sampling distribution of \(\bar{x}_i\) is normal with mean \(\mu_i\), standard error \(\dfrac{\sigma_i}{\sqrt{n_i}}\), and the estimated standard error \(\dfrac{s_i}{\sqrt{n_i}}\), for \(i=1, 2\).

Using the Central Limit Theorem, if the population is not normal, then with a large sample, the sampling distribution is approximately normal.

The theorem presented in this Lesson says that if either of the above are true, then \(\bar{x}_1-\bar{x}_2\) is approximately normal with mean \(\mu_1-\mu_2\), and standard error \(\sqrt{\dfrac{\sigma^2_1}{n_1}+\dfrac{\sigma^2_2}{n_2}}\).

That all sounds great, however, in most cases, \(\sigma_1\) and \(\sigma_2\) are unknown, and they have to be estimated. It seems natural to estimate \(\sigma_1\) by \(s_1\) and \(\sigma_2\) by \(s_2\). When the sample sizes are small, the estimates may not be that accurate and one may get a better estimate for the common standard deviation by pooling the data from both populations if the standard deviations for the two populations are not that different, however if the standard deviations are different, then we want to include that difference in our test. 

Given this, there are two options for estimating the variances for the independent samples:

  • Using pooled variances
  • Using unpooled (or unequal) variances

When to use which? Well, first, the nice thing is that many software packages calculate the variances "behind the curtain" and will show you the most appropriate output. However, if you are NOT sure, you can always use the unpooled method. The consequence of using unpooled is that the test is more conservative making it marginally more difficult to reject the null. However, the consequence of using pooled variances is an incorrect model. 


5.6.1.1 - Pooled Variances

5.6.1.1 - Pooled Variances

Hypothesis Tests for \(\mu_1− \mu_2\): The Pooled t-test

Now let's consider the hypothesis test for the mean differences with pooled variances.

Null:

\(H_0\colon\mu_1-\mu_2=0\)

Conditions:

The assumptions/conditions are:

  • The populations are independent
  • The population variances are equal
  • Each population is either normal or the sample size is large

Test Statistic:

The test statistic is...

\(t^*=\dfrac{\bar{x}_1-\bar{x}_2-0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\)

And \(t^*\) follows a t-distribution with degrees of freedom equal to \(df=n_1+n_2-2\).

The p-value, critical value, and conclusion are found similar to what we have done before.


5.6.1.2 - Unpooled Variances

5.6.1.2 - Unpooled Variances

When the assumption of equal variances is not valid, we need to use separate, or unpooled, variances. The mathematics and theory are complicated for this case and we intentionally leave out the details.

Hypothesis Tests for \(\mu_1− \mu_2\): The Pooled t-test

Null:

\(H_0\colon\mu_1-\mu_2=0\)

Conditions:

We still have the following assumptions:

  • The populations are independent
  • Each population is either normal or the sample size is large

Test Statistic:

If the assumptions are satisfied, then

\(t^*=\dfrac{\bar{x}_1-\bar{x_2}-0}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}}\)

will have a t-distribution with degrees of freedom

\(df=\dfrac{(n_1-1)(n_2-1)}{(n_2-1)C^2+(1-C)^2(n_1-1)}\)

where \(C=\dfrac{\frac{s^2_1}{n_1}}{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}\).

Note! This calculation for the exact degrees of freedom is cumbersome and is typically done by software. An alternate, conservative option to using the exact degrees of freedom calculation can be made by choosing the smaller of \(n_1-1\) and \(n_2-1\).
\((1-\alpha)100\%\) Confidence Interval for \(\mu_1-\mu_2\) for Unpooled Variances
\(\bar{x}_1-\bar{x}_2\pm t_{\alpha/2} \sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}\)

Where \(t_{\alpha/2}\) comes from the t-distribution using the degrees of freedom above.

Minitab®

Unpooled t-test

To perform a separate variance 2-sample, t-procedure use the same commands as for the pooled procedure EXCEPT we do NOT check box for 'Use Equal Variances.'

  1. Choose Stat > Basic Statistics > 2-sample t
  2. Select the Options box and enter the desired 'Confidence level,' 'Null hypothesis value' (again for our class this will be 0), and select the correct 'Alternative hypothesis' from the drop-down menu.
  3. Choose OK.

For some examples, one can use both the pooled t-procedure and the separate variances (non-pooled) t-procedure and obtain results that are close to each other. However, when the sample standard deviations are very different from each other, and the sample sizes are different, the separate variances 2-sample t-procedure is more reliable.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility