9.2  Two Independent Means
9.2  Two Independent MeansLet's explore how we can compare the means of two independent groups. If the populations are known to be approximately normally distributed, or if both sample sizes are at least 30, then the sampling distribution can be estimated using the \(t\) distribution. If this assumption is not met then simulation methods (i.e., bootstrapping or randomization) may be used.
9.2.1  Confidence Intervals
9.2.1  Confidence IntervalsGiven that the populations are known to be normally distributed, or if both sample sizes are at least 30, then the sampling distribution can be approximated using the \(t\) distribution, and the formulas below may be used. Here you will be introduced to the formulas to construct a confidence interval using the \(t\) distribution. Minitab will do all of these calculations for you, however, it uses a more sophisticated method to compute the degrees of freedom so answers may vary slightly, particularly with smaller sample sizes.
 General Form of a Confidence Interval
 \(point \;estimate \pm (multiplier) (standard \;error)\)
Here, the point estimate is the difference between the two mean, \(\bar{x} _1  \bar{x}_2\).
 Standard Error
 \(\sqrt{\dfrac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\)
 Confidence Interval for Two Independent Means
 \((\bar{x}_1\bar{x}_2) \pm t^\ast{ \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\)
The degrees of freedom can be approximated as the smallest sample size minus one.
 Estimated Degrees of Freedom

\(df=smallest\;n1\)
Example: Exam Scores by Learner Type
A STAT 200 instructor wants to know how traditional students and adult learners differ in terms of their final exam scores. She collected the following data from a sample of students:
Traditional Students  Adult Learners  
\(\overline x\) 
41.48  40.79 
\(s\)  6.03  6.79 
\(n\)  239  138 
She wants to construct a 95% confidence interval to estimate the mean difference.
The point estimate, or "best estimate," is the difference in sample means:
\(\overline x _1  \overline x_2 = 41.4840.79=0.69\)
The standard error can be computed next:
\(\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}=\sqrt{\dfrac{6.03^2}{239}+\dfrac{6.79^2}{138}}=0.697\)
To find the multiplier, we need to construct a t distribution with \(df=smaller\;n1=1381=137\) to find the t scores that separate the middle 95% of the distribution from the outer 5% of the distribution:
\(t^*=1.97743\)
Now, we can combine all of these values to construct our confidence interval:
\(point \;estimate \pm (multiplier) (standard \;error)\)
\(0.69 \pm 1.97743 (0.697)\)
\(0.69 \pm 1.379\) The margin of error is 1.379
\([0.689, 2.069]\)
We are 95% confident that the mean difference in traditional students' and adult learners' final exam scores is between 0.689 points and +2.069 points.
9.2.1.1  Minitab: Confidence Interval Between 2 Independent Means
9.2.1.1  Minitab: Confidence Interval Between 2 Independent MeansMinitab can be used to construct a confidence interval for the difference between two independent means. Note that the confidence intervals given in the Minitab output assume that either the populations are normally distributed or that both sample sizes are at least 30.
Minitab^{®} – Confidence Interval Between 2 Independent Means
Let's estimate the difference between the mean weight (in pounds) of females and the mean weight of males. Both sample sizes are at least 30 so the sampling distribution can be approximated using the t distribution.
 Open the Minitab file: class_survey.mpx
 Select Stat > Basic Statistics > 2 Sample t
 Select Both samples are in one column from the dropdown
 Double click the variable Weight in the box on the left to insert the variable into the Samples box
 Double click the variable Biological Sex in the box on the left to insert the variable into the Sample IDs box
 Click OK
This should result in the following output:
Method
\(\mu_1\): mean of Weight when Biological Sex = Female
\(\mu_2\): mean of Weight when Biological Sex = Male
Difference: \(\mu_1\mu_2\)
Equal variances are not assumed for this analysis.
Descriptive Statistics: Weight
Gender  N  Mean  StDev  SE Mean 

Female  126  136.7  23.4  2.1 
Male  99  172.7  27.3  2.7 
Estimation for Difference
Difference  95% CI for Difference 

35.99  (42.79, 29.20) 
Interpret
I am 95% confident that in the population the mean weight of females is between 29.202 pounds and 42.787 pounds less than the mean weight of males.
9.2.1.1.1  Video Example: Mean Difference in Exam Scores, Summarized Data
9.2.1.1.1  Video Example: Mean Difference in Exam Scores, Summarized Data9.2.2  Hypothesis Testing
9.2.2  Hypothesis TestingThe formula for the test statistic follows the same general format as the others that we have seen this week:
 Test Statistic
 \(test\; statistic = \dfrac{sample \; statistic  null\;parameter}{standard \;error}\)
Minitab will compute the test statistic for you! You will just need to determine if equal variances should be assumed or not. There is one example below walking through these procedures by hand, but you are strongly encouraged to use Minitab whenever possible.
There are two assumptions: (1) the two samples are independent and (2) both populations are normally distributed or \(n_1 \geq 30\) and \(n_2 \geq 30\). If the second assumption is not met then you can conduct a randomization test.
Below are the possible null and alternative hypothesis pairs:
Research Question  Are the means of group 1 and group 2 different?  Is the mean of group 1 greater than the mean of group 2?  Is the mean of group 1 less than the mean of group 2? 

Null Hypothesis, \(H_{0}\)  \(\mu_1 = \mu_2\)  \(\mu_1 = \mu_2\)  \(\mu_1 = \mu_2\) 
Alternative Hypothesis, \(H_{a}\)  \(\mu_1 \neq \mu_2\)  \(\mu_1 > \mu_2\)  \(\mu_1 < \mu_2\) 
Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional 
Standard Error
\(\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}\)
Test Statistic for Independent Means
\(t=\dfrac{\bar{x}_1\bar{x}_2}{ \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\)
Estimated Degrees of Freedom
\(df=smallest\;n  1\)
The \(t\) test statistic found in Step 2 is used to determine the pvalue.
If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis.
Based on your decision in Step 4, write a conclusion in terms of the original research question.
9.2.2.1  Minitab: Independent Means t Test
9.2.2.1  Minitab: Independent Means t TestHere we will use Minitab to conduct an independent means ttest. Note that Minitab uses a more complicated formula for computing the degrees of freedom for this test.
Within Minitab, the procedure for obtaining the test statistic and confidence interval for independent means is identical.
Minitab^{®} – Conducting an Independent Means t Test
Let's compare the mean SATMath scores of students who have and have not ever cheated. Both sample sizes are at least 30 so the sampling distribution can be approximated using the \(t\) distribution.
 Open the Minitab file: class_survey.mpx
 Select Stat > Basic Statistics > 2 Sample t...
 Enter the variable SATM into the Samples box
 Enter variable Ever_Cheat into the Sample IDs box
 Click OK
This should result in the following output:
Method
\(\mu_1\): mean of SATM when Ever_Cheat = No
\(\mu_2\): mean of SATM when Ever_Cheat = Yes
Difference: \(\mu_1\mu_2\)
Equal variances are not assumed for this analysis.
Descriptive Statistics: SATM
Ever_Cheat  N  Mean  StDev  SE Mean 

No  163  604.0  86.9  6.8 
Yes  53  583.7  79.2  11 
Estimation of Difference
Difference  95% CI for Difference 

20.3  (5.2, 45.8) 
Test
Null hypothesis  \(H_0\): \(\mu_1\mu_2=0\) 

Alternative hypothesis  \(H_1\): \(\mu_1\mu_2\neq0\) 
TValue  DF  PValue 

1.58  95  0.117 
The result of our two independent means t test is \(t(95) = 1.58, p = 0.117\). Our pvalue is greater than the standard alpha level of 0.05 so we fail to reject the null hypothesis. There is not enough evidence to state that the mean SATMath scores of students who have and have not ever cheated are different.
Note that we could also interpret the confidence interval in this output. We are 95% confident that the mean difference in the population is between 5.16 and 45.78.
The example above uses a dataset. The following examples show how you can conduct this type of test using summarized data.
9.2.2.1.1  Example: Summarized Data
9.2.2.1.1  Example: Summarized DataExample: Weight by Treatment
Research question: Do patients who receive our treatment weigh less than participants who do not receive our treatment?
Participants were randomly assigned to the treatment condition or a control group. After our intervention, their weights were measured in pounds. Weight is a quantitative variable, so we are going to be comparing means in this example. If assumptions are met, we’ll be conducting a two independent means t test.
Our treatment group has a sample size of 45, mean of 140 pounds, and standard deviation of 20 pounds. Our control group has a sample size of 40, sample mean of 150 pounds, and standard deviation of 25 pounds.
Follow the 5 step hypothesis testing procedure to analyze this data in Minitab.
There are two assumptions: (1) the two samples are independent and (2) both populations are normally distributed or \(n_1 \geq 30\) and \(n_2 \geq 30\). The participants were randomly assigned to one of the two groups. They are in no way matched or paired so they are independent. Both groups have sample size of at least 30.
Our hypotheses is based on the research question "Do patients who receive our treatment weigh less than participants who do not receive our treatment?." This indicates a left tail test. (T = treatment group, C = control group)
\(H_0\): \(\mu_T = \mu_C\)
\(H_a\): \(\mu_T < \mu_C\)
Use Minitab to perform the ttest.
2Sample independent ttest using summarized data
 Open Minitab
 Select Stat > Basic Statistics > 2 Sample t...
 Select Summarized data in the dropdown at the top
 Enter the summary statistics in the table with the treatment group as Sample 1 and the control group as Sample 2.
Sample 1 Sample 2 Sample size: 45 40 Sample means: 140 150 Standard deviation: 20 25  Select the Options button
 For the Alternative hypothesis choose Difference < hypothesized difference
 OK and OK
And we get the following output:
Method
\(\mu_1\): population mean of Sample 1
\(\mu_2\): population mean of Sample 2
Difference: \(\mu_1\mu_2\)
Equal variances are not assumed for this analysis.
Descriptive Statistics
Sample  N  Mean  StDev  SE Mean 

Sample 1  45  140.0  20.0  3.0 
Sample 2  40  150.0  25.0  4.0 
Estimation of Difference
Difference  95% CI for Difference 

10.00  1.75 
Test
Null hypothesis  \(H_0\): \(\mu_1\mu_2=0\) 

Alternative hypothesis  \(H_1\): \(\mu_1\mu_2\lt0\) 
TValue  DF  PValue 

2.02  74  0.024 
The tvalue is 2.02.
The pvalue is 0.024.
\(p \leq \alpha\), reject the null hypothesis.
There is evidence that patients who receive our treatment weigh less than participants who do not receive our treatment in the population.
9.2.2.1.3  Example: Height by Sex
9.2.2.1.3  Example: Height by SexResearch Question: In the population of all college students, is the mean height of females less than the mean height of males?
Data concerning height (in inches) were collected from 99 females and 126 males.
This example uses the following Minitab file: class_survey.csv
We have two independent groups: females and males. Height in inches is a quantitative variable. This means that we will be comparing the means of two independent groups.
There are 126 females and 99 males in our sample. The sampling distribution will be approximately normally distributed because both sample sizes are at least 30.
This is a lefttailed test because we want to know if the mean for females is less than the mean for males.
(Note: Minitab will arrange the levels of the explanatory variable in alphabetical order. This is why "females" are listed before "males" in this example.)
\(H_{0}:\mu_f = \mu_m \)
\(H_{a}: \mu_f < \mu_m \)
 Open the file and select Stat > Basic Statistics > 2 Sample t...
 Enter variable Height into the Samples box
 Enter the variable Biological Sex in the box into the Sample IDs box
 Choose Options and select 'Difference < Hypothesized difference' for the alternative hypothesis.
 Click OK
This should result in the following output:
Method
\(\mu_1\): mean of Height when Biological Sex = Female
\(\mu_2\): mean of Height when Biological Sex = Male
Difference: \(\mu_1\mu_2\)
Equal variances are not assumed for this analysis.
Descriptive Statistics: Height
Gender  N  Mean  StDev  SE Mean 

Female  126  65.62  6.53  0.58 
Male  99  70.24  3.63  0.37 
Estimation for Difference
Difference  95% Upper Bound for Difference 

4.623  3.488 
Test
Null hypothesis 
\(H_0\): \(\mu_1\mu_2=0\) 

Alternative hypothesis  \(H_1\): \(\mu_1\mu_2<0\) 
TValue  DF  PValue 

6.73  202  0.000 
The test statistic is t = 6.73
From the output given in Step 2, the pvalue is 0.000
\(p\leq.05\), therefore we reject the null hypothesis.
There is evidence that the mean height of female students is less than the mean height of male students in the population.