11.2.1 - Dependent Samples - Introduction

Suppose you were asked shortly after his inauguration in 2021, "Do you think Joe Biden will be effective as President of the United States?" Then, after four years, suppose you are asked, "Do you think Joe Biden has been effective as President of the United States?"

Whatever your responses are over the two time points, they will not be independent because they come from the same person. This is not the same situation as being chosen by chance in two random samples at those time points. If the same people are necessarily asked again four years later, then we really have only one sample with pairs of (dependent) responses.

In dependent samples, each observation in one sample can be paired with an observation in the other sample. Examples may include responses for essentially the same question at two points in time, such as in the example above, or even responses for different individuals if they are related in some way. Consider the examples below, the first of which has responses paired by their sibling relationship.

Example: Siblings and Puzzle Solving Section

Suppose we conduct an experiment to see whether a two-year difference in age among children leads to a difference in the ability to solve a puzzle quickly. Thirty-seven pairs of siblings of ages six (younger) and eight (older) were sampled, each child is given the puzzle, and the time taken to solve the puzzle (<1 minute, >1 minute) is recorded.

older sibling	younger sibling
older sibling	<1 min	>1 min
<1 min	15	7
>1 min	5	10

Do older siblings tend to be more (or less) likely than younger siblings to solve the puzzle in less than one minute.

When studying matched pairs data we might be interested in:

Comparing the margins of the table (i.e., a row proportion versus a column proportion). In the example here, this would be comparing the proportion of younger siblings solving the puzzle in less than one minute against the proportion of older siblings solving the puzzle in less than one minute.
Measuring agreement between two groups. In this case, a high agreement would mean puzzles that tend to be solved in less than one minute by younger siblings also tend to be solved in less than one minute by older siblings. Higher agreement also corresponds to larger values in the diagonal cells of the table.

We will focus on single summary statistics and tests in this lesson. A log-linear, model-based approach to matched data will be taken up in the next lesson.

Example: Movie Critiques Section

The data below are on Siskel's and Ebert's opinions about 160 movies from April 1995 through September 1996. Reference: Agresti and Winner (1997) "Evaluating agreement and disagreement among movie reviewers." Chance, 10, pg. 10-14.

Siskel	Ebert
Siskel	con	mixed	pro	total
con	24	8	13	45
mixed	8	13	11	32
pro	10	9	64	83
total	42	30	88	160

Do Siskel and Ebert really agree on the ratings of the same movies? If so, where is the dependence coming from?