# 7.1 - Comparing Two Groups

Previously we discussed testing means from one sample or paired data. But what about situations where the data is not paired, such as when comparing exam results between males and females, or the percentage of those who smoke between teenagers and adults? In such instances we will use inference to compare the responses in two groups, each from a distinct population (e.g. Males/Females, White/Non-White, and Deans List/Not Deans List).

This is called a **two-sample** situation and is one of the most common settings in statistical applications. Responses in each group must be **independent** of those in the other, meaning that different, unrelated, unpaired individuals make up the two samples. Sample sizes, however, may vary between the two groups.

#### Examine Difference Between Population Parameters

To look at the difference between the two groups, we look at the difference between population parameters for the two populations involved.

- For categorical data, we compare the proportions with a characteristic of interest (say, proportions successfully treated with two different treatments).
- For quantitative data, we compare means (say, mean GPAs for males and females).

**SPECIAL NOTE: **When comparing two means for independent samples, our initial thought goes to how do we calculate the standard error. The answer depends on whether we can consider the variances (and therefore the standard deviations) from each of the samples to be equal (**pooled**) or unequal (**unpooled**).This implies that prior to doing a two-sample test we will need to first find the standard deviation for each sample (Recall this can be done in Minitab by: Stat > Basic Statistics > Display Descriptive Statistics and enter the two variable names in the Variables window. RULE OF THUMB - If the larger standard deviation is **no more than twice** the smaller standard deviation, then we would consider the two population variances equal.

#### Population Parameters, Null Hypotheses, and Sample Statistics

The following summarizes population and sample notation for comparisons, and gives the null hypothesis for each situation (proportions and means).

Parameter name and description | Symbol for population parameter | Typical null hypothesis | Symbol for the sample statistic |

Categorical Response Variable Difference in two population proportions | p_{1} – p_{2} | H_{0}: p_{1} – p_{2} = 0 | \( \hat{p}_1-\hat{p}_2\) |

Quantitative Response Variable Difference in two population means | μ _{1 }− μ_{2} | H_{0}: μ_{1 }− μ_{2} = 0 | \( \bar {x}_1- \bar {x}_2\) |

Quantitative Response Variable Difference between matched pairs | μ _{d } | H_{0}: μ_{d }= 0 | \( \bar{X}_d\) |

The null hypothesis for each situation is that the difference in population parameters = 0; that is, there is no difference. Remember that hypotheses are statements about populations!

**NOTE**: The use of "0" for the difference is common practice. Technically however this difference could be any value. For example, we could say that the difference between the percentage of males that smoke to the percentage of females that smoke is equal to 4%. Now the null hypothesis would read: H_{0 }: *p*_{1} – *p*_{2} = 0.04. For our class we will restrict ourselves to using a difference of 0.