In the Coronary Heart example, it is sensible to think of serum cholesterol as an explanatory variable and CHD as a response. Therefore, it would make sense to estimate the conditional probabilities of CHD within the four cholesterol groups. To do this, we simply divide each cell count \(n_{ij}\) by its column total \(n_{+j}\); the resulting proportion \(n_{ij}/n_{+j}\) is an estimate of \(P(Y = i |Z = j)\). To see this, note that
\(P(Y=i|Z=j)=\dfrac{P(Y=i,Z=j)}{P(Z=j)}\)
and is intuitively estimated by
\(\dfrac{n_{ij}/n_{++}}{n_{+j}/n_{++}}=\dfrac{n_{ij}}{n_{+j}}\).
These values correspond to "Col Pct" in the SAS output. In R, we need to calculate them based on the above formula, e.g., see HeartDisease.R. The result is shown below.
0-199 | 200-219 | 220-259 | 260+ | |
CHD | 12/319 = .038 |
8/254 = .031 |
31/470 = .066 |
41/286 = .143 |
no CHD | 307/319 = .962 |
246/254 = .969 |
439/470 = .934 |
245/286 = .857 |
The risk of CHD appears to be essentially constant for the groups with cholesterol levels between 0–199 and 200–219. Although the estimated probability drops from .038 to .031, this drop is not statistically significant. We can test this by doing a test for the difference in proportions or by doing a chi-square test of independence for the relevant \(2 \times 2\) sub-table:
0-199 | 200-219 | |
CHD | 12/319 = .038 |
8/254 = .031 |
no CHD | 307/319 = .962 |
246/254 = .969 |
The test yields a \(X^2 = 0.157\) with df=1, \(p\)-value = .69. For the other two groups, however, the risk of CHD is substantially higher. We can do similar tests for other sets of cells. In fact, any two levels of cholesterol may be compared and tested for association between CHD and cholesterol level.
Describing associations in \(I \times J\) tables Section
In a \(2 \times 2\) table, the relationship between the two binary variables could be summarized by a single number (e.g., odds ratio). For an \(I \times J\) table, the usual \(X^2\) or \(G^2\) test for independence has \((IJ − 1) − (I − 1) − (J − 1) = (I − 1)(J − 1)\) degrees of freedom. This means that, with \(I > 2\) or \(J > 2\), there are multiple dimensions to the manner in which the data can depart from independence. The direction and magnitude of the departure from the null hypothesis can no longer be summarized by a single number, but must be summarized by \((I −1)(J −1)\) numbers of (i) difference in proportions, and/or (ii) relative risk, and/or (iii) odds ratios.
In the Coronary Heart Disease study, for example, we could summarize the relationship between CHD and cholesterol level by a set of three relative risks:
- 200–219 versus 0–199,
- 220–259 versus 0–199, and
- 260+ versus 0–199.
That is, we could estimate the risk of CHD at each cholesterol level relative to a common baseline. Or, we could use
- 200–219 versus 0–199,
- 220–259 versus 200–219, and
- 260+ versus 220–259,
This estimates the risk of each category relative to the category immediately below. Other comparisons are also possible, but they may not make sense in interpreting the data).
Example: Smoking Behaviors Section
The table below classifies 5375 high school students according to the smoking behavior of the student \(Z\) and the smoking behavior of the student’s parents \(Y\). We are interested in analyzing if there is a relationship of smoking behavior between the students and their parents?
How many parents smoke? | Student smokes? | |
---|---|---|
Yes (Z = 1) | No (Z = 2) | |
Both (Y = 1) | 400 | 1380 |
One (Y = 2) | 416 | 1823 |
Neither (Y = 3) | 188 | 1168 |
The test for independence yields \(X^2 = 37.6\), and \(G^2 = 38.4\) with df = 2 (\(p\)-values are essentially zero), so we have decided that \(Y\) and \(Z\) are related. It is natural to think of \(Z\) in this example as a response and \(Y\) as a predictor, so we will discuss the conditional distribution of \(Z\) given \(Y\). Let \(\pi_i = P(Z = 1|Y = i)\), for \(i=1,2,3\). The estimates of these probabilities are
\(\hat{\pi}_1=400/1780=0.225\)
\(\hat{\pi}_2=416/2239=0.186\)
\(\hat{\pi}_3=188/1356=0.139\)
We can then compare these as risks associated with the parameters. The effect of \(Y\) on \(Z\) can be summarized with two differences. For example, we can calculate the increase in the probability of \(Z = 1\) as \(Y\) goes from 3 to 2, and as \(Y\) goes from 2 to 1:
\(\hat{d}_{23}=\hat{\pi}_2-\hat{\pi}_3=0.047\)
\(\hat{d}_{12}=\hat{\pi}_1-\hat{\pi}_2=0.039\)
Alternatively, we may treat \(Y = 3\) as a baseline and calculate the increase in probability as we go from \(Y = 3\) to \(Y = 2\) and from \(Y = 3\) to \(Y = 1\):
\(\hat{d}_{23}=\hat{\pi}_2-\hat{\pi}_3=0.047\)
\(\hat{d}_{13}=\hat{\pi}_1-\hat{\pi}_3=0.086\)
We may also express the effects as the sample odds ratios (e.g., look at any \(2\times 2\) table within this larger \(3 \times 2\) table):
\(\hat{\theta}_{23}=\dfrac{416\times 1168}{188\times 1823}=1.42\)
\(\hat{\theta}_{13}=\dfrac{400\times 1168}{188\times 1380}=1.80\)
The estimated value of 1.42 means that students with one smoking parent are estimated to be 42% more likely (on the odds scale) to smoke than students whose parents do not smoke (the last two rows of the table). The value of 1.80 means that students with two smoking parents are 80% more likely to smoke than students whose parents do not smoke (the first and the last rows of the table).
In a \(3 \times 2\) table, the relationship between the two variables must be summarized with two differences in proportions or two relative risks or two odds ratios. More generally, to describe the relationship between the two variables in an \(I × J\) table will require \((I − 1)(J − 1)\) numbers. You can specify a large number of different odds ratios depending on the size of the table, yet the minimum number of these ratios that efficiently describes the data is described as \((I - 1)(J - 1)\) number of ratios. There is a relationship between the minimum number of odds ratios and degrees of freedom for testing independence. Which odds ratios are most meaningful to the researcher depends on the research question at hand.
Besides the point estimates, we can also test hypotheses about the odds ratios or compute confidence intervals. You could do the same for the relative risks or difference in proportions as we discussed in previous sections. To do this computationally in SAS and/or R, we need to analyze each \(2\times2\) sub-table separately. Basically, we treat each \(2\times2\) table as a "new" data set.
In SAS the OPTION ALL should give all possible measures; see smokeindep.sas (output, smokeindep SAS output). Depending which SAS version you are using the OPTIONS may be different, e.g., RELRISK, RRC1, RRC2, etc., and some of them work only for \(2\times2\) tables. For the current list, see the current SAS Support Documentation
In R, see smokeindep.R (output, smokeindep.out). The {vcd} package has a number of useful functions, e.g., oddsratio(), assocstats(); the latter will give \(X^2\), \(G^2\), and some other measures of associations, such as Cramer's V.
Statistical versus Practical Significance Section
In proposing measures of effect size, we need to realize that there is a difference between saying that an effect is statistically significant and saying that it is large.
A test statistic or p-value is a measure of the evidence against a null hypothesis, and this evidence depends on the sample size. An effect size, however, should not change if n is arbitrarily increased.
In some situations, there may be an artificial dependency of statistical significance on sample size. If the sample size is small, and large-sample goodness-of-fit statistic is computed, the \(p\)-value may not be the best statistic to depend upon because the large-sample theory will not hold. Alternatively, if the sample size is very large you may obtain significant results where there really should not be one. Also, recall Type I and Type II errors of hypothesis testing.
The \(X^2\) and \(G^2\)^{ }test statistics are not appropriate measures of association between two variables. They are sufficient to test the null hypothesis, but not to describe the direction and magnitude of the association.
Here is a hypothetical example that will help to illustrate this point. First, consider the Vitamin C example again. The following table classifies a sample of French skiers by whether they got a cold, or not given a placebo or Vitamin C (ascorbic acid).
Cold | No Cold | Totals | |
---|---|---|---|
Placebo | 31 | 109 | 140 |
Ascorbic Acid | 17 | 122 | 139 |
Totals | 48 | 231 | 279 |
The test for independence yields \(X^2 = 4.814\) and the \(p\)-value=0.0283. The conclusion here would be that there is strong evidence against the null hypothesis of independence. Now, let's suppose that we artificially inflate the sample size by multiplying each cell count by ten.
Cold | No Cold | Totals | |
---|---|---|---|
Placebo | 310 | 1090 | 1400 |
Ascorbic Acid | 170 | 1220 | 1390 |
Totals | 480 | 2310 | 2790 |
The cell proportions for this table remain identical to those of the previous table; the relationship between the two binary variables appears to be exactly the same. Yet, now the \(X^2\) statistic is \(10(4.811) = 48.11\). The new \(p\)-value is close to 0, so the evidence against independence is now VERY strong to reject the null---not because the relationship between the two variables is any stronger, but merely because the sample size has gone up.
Warning: A large \(p\)-value is NOT strong evidence in favor of \(H_0\). A large \(p\)-value can occur if (1) \(H_0\) is indeed true, or (2) \(H_0\) is false, but the test has low power.
Now, let's suppose that we artificially deflate the sample size by dividing each cell count by ten, and account for rounding.
Cold | No Cold | Totals | |
---|---|---|---|
Placebo | 3 | 11 | 14 |
Ascorbic Acid | 2 | 12 | 14 |
Totals | 5 | 23 | 28 |
The cell proportions for this table remain nearly identical to those of the previous tables; the relationship between the two binary variables appears to be the same. Yet, now the \(X^2\) statistic is 0.2435. The new \(p\)-value is 0.6217 which gives little or no evidence against independence, but it does not tell us whether the weakness of association is due to (a) weak correlation between the two variables, or (b) the sample size being too small. Moreover, \(X^2\) says nothing about the direction of the possible effect, e.g. whether vitamin C takers are more likely or less likely to be sick than non-takers of vitamin C.
and Notes
The above analysis was implemented in VitaminC.sas (output file VitaminC SAS Output). The corresponding R code file is VitaminC.R and is commented so that results similar to those in SAS can be obtained.