11.2.3 - Efficiency of Matched Pairs

For the sibling puzzle-solving example, note that the data consisted of 37 responses for six-year-olds (younger siblings) and 37 responses for eight-year-olds (older siblings). How would the results differ if, instead of siblings, those same values had arisen from two independent samples of children of those ages?

To see why the approach with the dependent data, matched by siblings, is more powerful, consider the table below. The sample size here is defined as \(n = 37+37=74\) total responses, compared with \(n=37\) total pairs in the previous approach.

	<1 min	>1 min	total
Older	22	15	37
Younger	20	17	37

The estimated difference in proportions from this table is identical to the previous one:

\(\hat{d}=\hat{p}_1-\hat{p}_2=\dfrac{22}{37}-\dfrac{20}{37}=0.0541\)

But with independent data, the test for an age effect, which is equivalent to the usual \(\chi^2\) test of independence, gives \(X^2 = 0.2202\), compared with McNemar's test gave \(z^2 = 0.333\). The standard error for \(\hat{d}\) in this new table is

\(\sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{37}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{37}}=0.1150\)

compared with \(0.0932\), when using siblings and taking into account the covariance between them. Just as with matched pairs for quantitative variables, the covariance between pair members leads to smaller standard errors and greater overall power, compared with the independent samples approach.

Let's take a look at the last part of Siblings.sas and its relevant output, where the same data are analyzed as if they were sampled independently and not matched by siblings.

data matched;
input time $ approval $ count ;
datalines;
 t1 approve 944
 t1 disapprove 656
 t2 approve  880
 t2 disapprove 720
;
proc freq data=matched order=data;
 weight count;
    tables time*approval /chisq riskdiff;
run;

Now we are doing just a regular test of independence, and the Pearson chi-square is \(0.2202\) with a p-value of 0.02. Although our conclusion seems to be identical (we still can't claim a significant age effect), notice that our p-value is less significant when the data are treated as independent. In general, the matched pairs approach is more powerful.

Statistics for Table of age by time


Statistic	DF	Value	Prob
Chi-Square	1	0.2202	0.6389
Likelihood Ratio Chi-Square	1	0.2204	0.6388
Continuity Adj. Chi-Square	1	0.0551	0.8145
Mantel-Haenszel Chi-Square	1	0.2173	0.6411
Phi Coefficient		0.0546
Contingency Coefficient		0.0545
Cramer's V		0.0546


Column 1 Risk Estimates
	Risk	ASE	95% Confidence Limits		Exact 95% Confidence Limits
Difference is (Row 1 - Row 2)
Row 1	0.5946	0.0807	0.4364	0.7528	0.4210	0.7525
Row 2	0.5405	0.0819	0.3800	0.7011	0.3692	0.7051
Total	0.5676	0.0576	0.4547	0.6804	0.4472	0.6823
Difference	0.0541	0.1150	-0.1714	0.2795

Let's take a look at the last part of Siblings.R and its relevant output, where the same data are analyzed as if they were sampled independently and not matched by siblings.


notsiblings = matrix(c(22,20,15,17),nr=2,
 dimnames=list(c("Older","Younger"),c("<1 min",">1 min")))
notsiblings

chisq.test(notsiblings, correct=F)
prop.test(notsiblings, correct=F)

> chisq.test(notsiblings, correct=F)

        Pearson's Chi-squared test

data:  notsiblings
X-squared = 0.22024, df = 1, p-value = 0.6389

> prop.test(notsiblings, correct=F)

        2-sample test for equality of proportions without continuity
        correction

data:  notsiblings
X-squared = 0.22024, df = 1, p-value = 0.6389
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1713610  0.2794691

McNemar’s test applies whenever the hypothesis of marginal homogeneity is of interest. Dependency and marginal homogeneity may arise in varieties of problems involving dependency.