Let us now look at the example involving the siblings in more detail. Both siblings solve the same puzzle, and the response for each is whether it took less than or longer than one minute. It is sensible to assume that these two responses should be related because siblings likely inherit similar problem-solving skills from their common parents, and indeed if we test for independence between siblings, we have \(X^2=4.3612\) with one degree of freedom and p-value 0.03677. If we view solving the puzzle in less than one minute as "success", then this is equivalent to testing for equal row proportions.

However, the question of primary interest in this study is whether the older siblings tend to have a higher proportion of success, compared with that of the younger siblings, which is a comparison of the first-row proportion against the first column proportion. Such a test does not require the use of siblings, and two samples of six-year-olds and eight-year-olds could have been independently chosen for this purpose. But using siblings allows for matched pairs of responses and controls for confounding factors that may be introduced with children from different parents.

The estimate for the difference in success proportions between ages is

\((15+7)/37-(15+5)/37=0.5946-0.5405=0.0541\)

Recall from Lesson 3 the variance for the estimated difference in proportions was

\(\displaystyle V(d)=\left[\frac{ \frac{\pi_{11}}{\pi_{1+}} (1-\frac{\pi_{11}}{\pi_{1+}})} {n_{1+}} + \frac{\frac{\pi_{21}}{\pi_{2+}} (1-\frac{\pi_{21}}{\pi_{2+}})} {n_{2+}} \right] \)

The variance of the difference is the sum of the individual variances only under independence; otherwise, we would need to take into account the covariance.

Another way of asking the question of no age effect is to ask whether the margins of the table are the same (rows versus columns), which can be done with the test of marginal homogeneity or McNemar's test, which we look at next.

##
Test of Marginal Homogeneity
Section* *

The notation needed for this test is the same as what we've seen earlier for the test of independence, but instead of focusing on comparing row proportions, we compare row versus column.

For older siblings, the probability of solving the puzzle in less than one minute (success) is

\(\pi_{1+} = \pi_{11} + \pi_{12}\)

And for younger siblings, the probability of solving the puzzle in less than one minute is

\(\pi_{+1} = \pi_{11} + \pi_{21}\)

The null hypothesis of no difference (marginal homogeneity) in a \(2 \times 2\) table is

\(H_0 \colon \pi_{1+} = \pi_{+1} \)

and is equivalent to the hypothesis that the off-diagonal probabilities are equal:

\(H_0 \colon \pi_{12} = \pi_{21} \)

The second of these above is also known as the **hypothesis of symmetry **and generalizes to equal off-diagonal cell counts for larger tables as well. Note that the diagonal elements are not important here. They correspond to the proportions of puzzles equally between the two siblings (either both less than or both greater than one minute). All the information required to determine whether there's a difference due to age is contained in the off-diagonal elements.

For general square \(I \times I \) tables, the hypothesis of marginal homogeneity is different from the hypothesis of symmetry, and the latter is a stronger hypothesis; symmetry introduces more structure in the square table. In a \(2 \times 2\) table, however, these two are the same test.

##
McNemar’s test for \(2 \times 2\) tables
Section* *

This is the usual test of marginal homogeneity (and symmetry) for a \(2 \times 2\) table.

\(H_0 : \pi_{1+} = \pi_{+1}\) or equivalently \(\pi_{12} = \pi_{21}\)

Suppose that we treat the total number of observations in the off-diagonal as fixed:

\(n^\ast =n_{12}+n_{21}\)

Under the null hypothesis above, each of \(n_{12}\) and \(n_{21}\) is assumed to follow a \(Bin (n^\ast , 0.5)\) distribution. The rationale is that, under the null hypothesis, we have \(n_{12}+n_{21}\) total "trials" that can either result in cell \((1,2)\) or \((2,1)\) with probability 0.5. And provided that \(n^*\) is sufficiently large, we can use the usual normal approximation to the binomial:

\(z=\dfrac{n_{12}-0.5n^\ast}{\sqrt{0.5(1-0.5)n^\ast}}=\dfrac{n_{12}-n_{21}}{\sqrt{n_{12}+n_{21}}}\)

where \(0.5n^*\) and \(0.5(1-0.5)n^\ast\) are the expected count and variance for \(n_{12}\) under the \(H_0\). Under \(H_0\), \(z\) is approximately standard normal. This approximation works well provided that \(n^* \ge 10\). The p-value would depend on the alternative hypothesis. If \(H_a\) is that older siblings have a greater success probability, then the p-value would be

\(P(Z\ge z)\)

where \(Z\) is standard normal, and \(z\) is the observed value of the test statistic. A lower-tailed alternative would correspondingly use the lesser-than probability, and a two-sided alternative would double the one-sided probability. Alternatively, for a two-sided alternative we may compare

\(z^2=\dfrac{(n_{12}-n_{21})^2}{n_{12}+n_{21}}\)

to a chi-square distribution with one degree of freedom. This test is valid under general multinomial sampling when \(n^\ast\) is not fixed, but the grand total \(n\) is. When the sample size is small, we can compute exact probabilities (p-values) using the binomial probability distribution.

Applying this to our example data gives

\(z=\dfrac{7-5}{\sqrt{7+5}}=0.577\)

The p-value is \(P(Z\ge 0.577)=0.2820\), which is not evidence of a difference in success probabilities (solving the puzzle in less than one minute) between the two age groups.

##
Point Estimation and Confidence Interval
Section* *

A sensible effect-size measure associated with McNemar’s test is the difference between the marginal proportions,

\(d=\pi_{1+}-\pi_{+1}=\pi_{12}-\pi_{21}\)

In large samples, the estimate of \(d\),

\(\hat{d}=\dfrac{n_{12}}{n}-\dfrac{n_{21}}{n}\)

is unbiased and approximately normal with variance

\begin{align}

V(\hat{d}) &= n^{-2} V(n_{12}-n_{21})\\

&= n^{-2}[V(n_{12})+ V(n_{21})-2Cov(n_{12},n_{21})]\\

&= n^{-1} [\pi_{12}(1-\pi_{12})+\pi_{21}(1-\pi_{21})+2\pi_{12} \pi_{21}]\\

\end{align}

An estimate of the variance is

\(\hat{V}(\hat{d})=n^{-1}\left[\dfrac{n_{12}}{n}(1-\dfrac{n_{12}}{n})+\dfrac{n_{21}}{n}(1-\dfrac{n_{21}}{n})+2\dfrac{n_{12}n_{21}}{n^2}\right]\)

and an approximate 95% confidence interval is

\(\hat{d}\pm 1.96\sqrt{\hat{V}(\hat{d})}\)

In our example, we get an estimated effect of \(\hat{d} = 0.0541\) and its standard error of \(\sqrt{\hat{V}(\hat{d})}=0.0932\), giving 95% confidence interval

\(0.0541\pm 1.96(0.0932)=(-0.1286, 0.2368)\)

Thus, although the older siblings had a higher proportion of success, it was not statistically significant. We cannot conclude that the two-year age difference is associated with faster puzzle-solving times.

Next, we do this analysis in SAS and R.

##
Example: Siblings and Puzzle Solving
Section* *

#### McNemar Test in SAS - Sibling Data

In SAS under PROC FREQ: option AGREE gives the normal approximation of the McNemar test, while EXACT MCNEM will give the exact version based on binomial probabilities. Here is a sample of what the SAS coding would look like:

```
data siblings;
input older younger count ;
datalines;
1 1 15
1 2 7
2 1 5
2 2 10
; run;
/* normal approximation and exact McNemar test */
proc freq data=siblings;
weight count;
tables older*younger / agree;
exact mcnem;
run;
```

Compare the value from the output below to the squared \(z\) value we computed on the previous page. The difference in the p-value (aside from some slight rounding) is due to the software using a two-sided alternative by default. We can divide this by 2 to get the one-sided version, however. In either case, the results indicate insignificant evidence of an age effect in puzzle-solving times.

McNemar's Test | |||
---|---|---|---|

Chi-Square | DF | Pr > ChiSq | Exact Pr >= ChiSq |

0.3333 | 1 | 0.5637 | 0.7744 |

#### McNemar Test in R - Sibling Data

In R we can use the **mcnemar.test()** as demonstrated in Siblings.R:

```
siblings = matrix(c(15,5,7,10),nr=2,
dimnames=list("older"=c("<1 min",">1 min"),"younger"=c("<1 min",">1 min")))
siblings
# usual test for independence comparing younger vs older
chisq.test(siblings, correct=F)
# McNemar test for equal proportions comparing younger vs older
mcnemar.test(siblings, correct=F)
```

Compare the value from the output below to the squared \(z\) value we computed on the previous page. The difference in the p-value (aside from some slight rounding) is due to the software using a two-sided alternative by default. We can divide this by 2 to get the one-sided version, however. In either case, the results indicate insignificant evidence of an age effect in puzzle-solving times.

```
> mcnemar.test(siblings, correct=F)
McNemar's Chi-squared test
data: siblings
McNemar's chi-squared = 0.33333, df = 1, p-value = 0.5637
```