25.1 - Lesson Notes

G. Two-way Frequency Tables

Page 89. The table at the bottom of page 89 is incorrect. The cells in the Dewey row are missing the last row of numbers corresponding to column percent. This is what the table output should look like:

The FREQ Procedure

Table of Exposure by Gender

Candid	Gender
Frequency
Percent
Row Pct
Col Pct	F	M	Total
Dewey	70	40	110
	38.89	22.22	61.11
	63.64	36.36
	70.00	50.00
Truman	30	40	70
	16.67	22.22	38.89
	42.86	57.14
	30.00	50.00
Total	100	80	180
Total	55.56	44.44	100.00

Using this corrected table, you can now see the correct frequency counts, percentages, row percentages, and column percentages in each cell. For example, the Dewey-Male cell tells us:

40 of the 180 people sampled were males who preferred Dewey.
22.22% of the 180 people sampled — that's 40 divided by 180 — were males who preferred Dewey.
Of the 110 people in the sample who preferred Dewey, 40 — that is, 36.36% — were male.
Of the 80 males in the sample, 40 — that is, 50.00% — preferred Dewey.

Page 90. The null and alternative hypotheses here are:

Null: There is no relationship between gender and preference.
Alternative hypothesis: There is a relationship between gender and preference.

The Chi-square statistic's P-value (0.0062) tells us that it is highly unlikely that we'd obtain such an extreme difference in the observed counts and the expected counts, (as summarized by the chi-square statistic) by chance alone. The P-value is very small... much smaller than 0.05, say. Therefore, we can reject the null hypothesis in favor of the alternative hypothesis. There is sufficient evidence at the 0.05 level to conclude that there is a relationship between gender and preference.

I. Computing Chi-Square From Frequency Counts

Page 92. I find I use the WEIGHT statement often. Whenever you don't have the original raw data available, but instead have the data already summarized in tables (as you might see on the evening news!), you have to use a WEIGHT statement to tell SAS to calculate the chi-square statistic for you. Here's the code I used to create the corrected table above in Section G:

DATA elect;
	input Gender $ Candid $ Count;
	DATALINES;
	F Dewey  70
	M Dewey  40
	F Truman 30
	M Truman 40
	;
RUN;

PROC FREQ data = elect;
	table Candid*Gender / chisq;
	weight count;
RUN;

L. McNemar's Test for Paired Data

Page 98. Without stating so, the authors compare the obtained P-value of 0.0253 to a small pre-set significance level, 0.05 say. Since 0.0253 is smaller than 0.05, they reject the null hypothesis and conclude that the advertising campaign was effective. Two comments here: (1) If the authors or any statistician draw conclusions based on a P-value without stating a significance level, you can probably assume that they are thinking about a 0.05 level. (2) There is nothing etched in stone that says you have to use a 0.05 level. You may have sound scientific reasons to use a smaller value, 0.01 say, or a larger value, 0.10 say. The important thing is that you report what you use when drawing your conclusions.

N. Odds Ratios

Page 101. The authors calculate the odds ratio to be 3.25. We interpret such an odds ratio in this way... we say that the odds of a case being exposed to benzene is 3.25 times the odds of a control being exposed to benzene.

Page 102. If the authors didn't use the trick of using 1-Yes in place of Yes, and 2-No in place of No, this is what their program would look like:

DATA odds;
	INPUT Outcome $ Exposure $ Count;
	DATALINES;
	Case    Yes  50
	Case    No  100
	Control Yes  20
	Control No  130
	;
RUN;

PROC FREQ data = odds;
	TABLE Exposure*Outcome / chisq cmh;
	WEIGHT Count;
RUN;

Note that the Exposure values are entered as Yes and No rather than, respectively, 1-Yes and 2-No. When you launch and run this program, this is what the odds ratio portion of the output looks like:

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study	Method	Value	95% Confidence	Limits
Case-Control	Mantel-Haenszel	0.3077	0.1722	0.5498
(Odds Ratio)	Logit	0.3077	0.1722	0.5498
Cohort	Mantel-Haenszel	0.6087	0.4939	0.7502
(Col1 Risk)	Logit	0.6087	0.4939	0.7502
Cohort	Mantel-Haenszel	1.9783	1.3429	2.9141
(Col2 Risk)	Logit	1.9783	1.3429	2.9141

Total Sample Size = 300

Now, the odds ratio is reported to be 0.3077. That's because the cells in the two-way table are now flip-flopped:

The FREQ Procedure

Table of Exposure by Outcome

Exposure	Outcome
Frequency
Percent
Row Pct
Col Pct	Case	Control	Total
No	100	130	230
	33.33	43.33	76.67
	43.48	56.52
	66.67	86.67
Yes	50	20	70
	16.67	6.67	23.33
	71.43	28.57
	33.33	13.33
Total	150	150	300
Total	50.00	50.00	100.00

Note that the No row appears first here, whereas in the text the 1-Yes row does. Here, we'd have to interpret the odds ratio as... the odds of a case not being exposed being 0.3077 times the odds of a control not being exposed. Do you agree that this interpretation is a little more awkward and a lot less helpful? Incidentally, you should note that 0.3077 is just the reciprocal of 3.25. That is 1 divided by 0.3077 equals 3.25.

Page 103. In the text below the output, the authors didn't quite report the 95% confidence interval for the odds ratio correctly. It should be (1.8189 to 5.807). We can be 95% confident that the true population odds ratio falls between 1.8189 and 5.807.

O. Relative Risk

Page 106. The authors didn't quite report the 95% confidence interval for the relative risk correctly either. It should be (1.0761 to 3.7171). We can be 95% confident that the true population relative risk falls between 1.0761 and 3.7171.

P. Chi-square Test for Trend.

Page 108. The authors state that "there may be times when your table chi-square is not significant but, since the test for trend is using more information (the order of the columns), it may be significant." You can see it moving in that direction in this example. The P-value for the table chi-square is 0.0283, whereas the P-value for the M-H chi-square is 0.0074. The P-value for the table chi-square test is almost four times larger than the P-value for the M-H chi-square test. Hence, the M-H chi-square test produces a more significant result than the table chi-square test.

Q. Mantel-Haenszel Chi-Square for Stratified Tables and Meta-Analysis

Page 111. The authors committed another error when reading the output. The P-value for the Cochran-M-H statistic is 0.0004. The relative risk is 1.9775 and the 95% confidence interval for the relative risk is (1.3474, 2.9021).