The tests discussed so far that use the chi-square approximation, including the Pearson and LRT for nominal data as well as the Mantel-Haenszel test for ordinal data, perform well when the contingency tables have a reasonable number of observations in each cell, as already discussed in Lesson 1. When samples are small, the distributions of \(X^2\), \(G^2\), and \(M^2\) (and other large-sample based statistics) are not well approximated by the chi-squared distribution; thus their \(p\)-values are not to be trusted. In such situations, we can perform inference using an exact distribution (or estimates of exact distributions), but we should keep in mind that \(p\)-values based on exact tests can be conservative (i.e, measured to be larger than they really are).
We may use an exact test if:
- the row totals \(n_{i+}\) and the column totals \(n_{+j}\) are both fixed by design of the study.
- we have a small sample size \(n\),
- more than 20% of cells have expected cell counts less than 5, and no expected cell count is less than 1.
Example: Lady tea tasting Section
Here we consider the famous tea tasting example! In a summer tea-part in Cambridge, England, a lady claimed to be able to discern, by taste alone, whether a cup of tea with milk had the tea poured first or the milk poured first. An experiment was performed by Sir R.A. Fisher himself, then and there, to see if her claim was valid. Eight cups of tea were prepared and presented to her in random order. Four had the milk poured first, and other four had the tea poured first. The lady tasted each one and rendered her opinion. The results are summarized in the following \(2 \times 2\) table:
Actually poured first | Lady says poured first | |
---|---|---|
tea | milk | |
tea | 3 | 1 |
milk | 1 | 3 |
The row totals are fixed by the experimenter. The column totals are fixed by the lady, who knows that four of the cups are "tea first" and four are "milk first." Under \(H_0\), the lady has no discerning ability, which is to say the four cups she calls "tea first" are a random sample from the eight. If she selects four at random, the probability that three of these four are actually "tea first" comes from the hypergeometric distribution, \(P(n_{11}=3)\):
\(\dfrac{\dbinom{4}{3}\dbinom{4}{1}}{\dbinom{8}{4}}=\dfrac{\dfrac{4!}{3!1!}\dfrac{4!}{1!3!}}{\dfrac{8!}{4!4!}}=\dfrac{16}{70}=0.229\)
A \(p\)-value is the probability of getting a result as extreme or more extreme than the event actually observed, assuming \(H_0\) is true. In this example, the \(p\)-value would be \(P(n_{11}\ge t_0)\), where \(t_0\) is the observed value of \(n_{11}\), which in this case is 3. The only result more extreme would be the lady's (correct) selection of all four the cups that are truly "tea first," which has probability
\(\dfrac{\dbinom{4}{4}\dbinom{4}{0}}{\dbinom{8}{4}}=\dfrac{1}{70}=0.014\)
As it turns out, the \(p\)-value is \(.229 + .014 = .243\), which is only weak evidence against the null. In other words, there is not enough evidence to reject the null hypothesis that the lady is just purely guessing. To be fair, experiments with small amounts of data are generally not very powerful, to begin with, given the limited information.
Here is how we can do this computation in SAS and R. Further below we describe in a bit more detail the underlying idea behind these calculations.
Code
/*----------------------------
| Example: Fisher's Tea Lady
-----------------------------*/
data tea;
input poured $ lady $ count;
datalines;
tea tea 3
tea milk 1
milk tea 1
milk milk 3
;
run;
proc freq data=tea order=data;
weight count;
tables poured*lady/ chisq relrisk riskdiff expected;
exact fisher chisq or;
run;
Output
The FREQ Procedure
|
|
Statistics for Table of poured by lady
Statistic | DF | Value | Prob |
---|---|---|---|
WARNING: 100% of the cells have expected counts less than 5. (Asymptotic) Chi-Square may not be a valid test. |
|||
Chi-Square | 1 | 2.0000 | 0.1573 |
Likelihood Ratio Chi-Square | 1 | 2.0930 | 0.1480 |
Continuity Adj. Chi-Square | 1 | 0.5000 | 0.4795 |
Mantel-Haenszel Chi-Square | 1 | 1.7500 | 0.1859 |
Phi Coefficient | 0.5000 | ||
Contingency Coefficient | 0.4472 | ||
Cramer's V | 0.5000 |
Pearson Chi-Square Test | |
---|---|
Chi-Square | 2.0000 |
DF | 1 |
Asymptotic Pr > ChiSq | 0.1573 |
Exact Pr >= ChiSq | 0.4857 |
Likelihood Ratio Chi-Square Test | |
---|---|
Chi-Square | 2.0930 |
DF | 1 |
Asymptotic Pr > ChiSq | 0.1480 |
Exact Pr >= ChiSq | 0.4857 |
Mantel-Haenszel Chi-Square Test | |
---|---|
Chi-Square | 1.7500 |
DF | 1 |
Asymptotic Pr > ChiSq | 0.1859 |
Exact Pr >= ChiSq | 0.4857 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 3 |
Left-sided Pr <= F | 0.9857 |
Right-sided Pr >= F | 0.2429 |
Table Probability (P) | 0.2286 |
Two-sided Pr <= P | 0.4857 |
Column 1 Risk Estimates | ||||||
---|---|---|---|---|---|---|
Risk | ASE | 95% Confidence Limits |
Exact 95% Confidence Limits |
|||
Difference is (Row 1 - Row 2) | ||||||
Row 1 | 0.7500 | 0.2165 | 0.3257 | 1.0000 | 0.1941 | 0.9937 |
Row 2 | 0.2500 | 0.2165 | 0.0000 | 0.6743 | 0.0063 | 0.8059 |
Total | 0.5000 | 0.1768 | 0.1535 | 0.8465 | 0.1570 | 0.8430 |
Difference | 0.5000 | 0.3062 | -0.1001 | 1.0000 |
Column 2 Risk Estimates | ||||||
---|---|---|---|---|---|---|
Risk | ASE | 95% Confidence Limits |
Exact 95% Confidence Limits |
|||
Difference is (Row 1 - Row 2) | ||||||
Row 1 | 0.2500 | 0.2165 | 0.0000 | 0.6743 | 0.0063 | 0.8059 |
Row 2 | 0.7500 | 0.2165 | 0.3257 | 1.0000 | 0.1941 | 0.9937 |
Total | 0.5000 | 0.1768 | 0.1535 | 0.8465 | 0.1570 | 0.8430 |
Difference | -0.5000 | 0.3062 | -1.0000 | 0.1001 |
Odds Ratio and Relative Risks | |||
---|---|---|---|
Statistic | Value | 95% Confidence Limits | |
Odds Ratio | 9.0000 | 0.3666 | 220.9270 |
Relative Risk (Column 1) | 3.0000 | 0.5013 | 17.9539 |
Relative Risk (Column 2) | 0.3333 | 0.0557 | 1.9949 |
Odds Ratio | |
---|---|
Odds Ratio | 9.0000 |
Asymptotic Conf Limits | |
95% Lower Conf Limit | 0.3666 |
95% Upper Conf Limit | 220.9270 |
Exact Conf Limits | |
95% Lower Conf Limit | 0.2117 |
95% Upper Conf Limit | 626.2435 |
Sample Size = 8
OPTION EXACT in SAS indicates that we are doing exact tests which consider ALL tables with the exact same margins as the observed table. This option will work for any \(I \times J\) table. OPTION FISHER, more specifically performs Fisher's exact test which is an exact test only for a \(2 \times 2\) table in SAS.
For R, see TeaLady.R where you can see we used the fisher.test() function to perform Fisher's exact test for the \(2 \times 2\) table in question.
#### one-sided Fisher's exact test
fisher.test(tea, alternative = "greater")
#### two-sided Fisher's exact test
fisher.test(tea)
The same could be done using chisq.test() with option, simulate.p.value=TRUE. By reading the help file on fisher. test() function, you will see that certain options in this function only work for \(2 \times 2\) tables. For the output, see TeaLady.out
The basic idea behind exact tests is that they enumerate all possible tables that have the same margins, e.g., row sums and column sums. Then to compute the relevant statistics, e.g., \(X^2\), \(G^2\), odds-ratios, we look for all tables where the values are more extreme than the one we have observed. The key here is that in the set of tables with the same margins, once we know the value in one cell, we know the rest of the cells. Therefore, to find a probability of observing a table, we need to find the probability of only one cell in the table (rather than the probabilities of four cells). Typically we use the value of cell (1,1).
Under the null hypothesis of independence, more specifically when odds-ratio \(\theta = 1\), the probability distribution of that one cell \(n_{11}\) is hypergeometric, as discussed in the Tea lady example.
In the Lady tasting tea example, there are 5 possible \(2 \times 2\) tables that have the same observed margins. Can you figure out which are those? Stop and Think!
Extension of Fisher's test Section
For problems where the number of possible tables is too large, Monte Carlo methods are used to approximate "exact" statistics (e.g., option MC in SAS FREQ EXACT and in R under chisq.test() you need to specify simulate.p.value = TRUE and indicate how many runs you want MC simulation to do; for more see the help files). This extension, and thus these options in SAS and R, of the Fisher's exact test for a \(2 \times 2\) table, in effect, takes samples from a large number of possibilities in order to simulate the exact test.
This test is "exact" because no large-sample approximations are used. The \(p\)-value is valid regardless of the sample size. Asymptotic results may be unreliable when the distribution of the data is sparse, or skewed. Exact computations are based on the statistical theory of exact conditional inference for contingency tables.
Fisher's exact test is definitely appropriate when the row totals and column totals are both fixed by design. Some have argued that it may also be used when only one set of margins is truly fixed. This idea arises because the marginal totals \(n_{1+}, n_{+1}\) provide little information about the odds ratio \(\theta\).
Exact non-null inference for \(\theta\) Section
When \(\theta = 1\), this distribution is hypergeometric, which we used in Fisher's exact test. More generally, Fisher (1935) gave this distribution for any value of \(\theta\). Using this distribution, it is easy to compute Fisher's exact \(p\)-value for testing the null hypothesis \(H_0:\theta=\theta^*\) for any \(\theta^*\). The set of all values \(\theta^*\) that cannot be rejected at the \(\alpha=.05\) level test forms an exact 95% confidence region for \(\theta\).
Let's look at a part of the SAS output a bit closer, we get the same CIs in the R ouput. First, notice the sample estimate of the odds ratio equal to 9, which we can compute from the cross-product ratio as we have discussed earlier. Note also that fisher.test() in R for \(2 \times 2\) tables will give so-called "conditional estimate" of the odds-ratio so the value will be different (in this case, approximately 6.408).
Notice the difference between the exact and asymptotic CIs for the odds ratio for the two-sided alternative (e.g., \(\theta\ne1\)). The exact version is larger. Recalling the interpretation of the odds ratio, what do these CIs tell us about true unknown odds-ratio? This is a simple example of how inference may vary if you have small samples or sparseness.
Odds Ratio | |
---|---|
Odds Ratio | 9.0000 |
Asymptotic Conf Limits | |
95% Lower Conf Limit | 0.3666 |
95% Upper Conf Limit | 220.9270 |
Exact Conf Limits | |
95% Lower Conf Limit | 0.2117 |
95% Upper Conf Limit | 626.2435 |
Sample Size = 8
Bias correction for estimating \(\theta\) Section
Earlier, we learned that the natural estimate of \(\theta\) is
\(\hat{\theta}=\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\)
and that \(\log\hat{\theta}\) is approximately normally distributed about \(\log \theta\) with estimated variance
\(\hat{V}(\log\hat{\theta})=\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}\)
Advanced note: this is the estimated variance of the limiting distribution, not an estimate of the variance of \(\log\hat{\theta}\) itself. Because there is a nonzero probability that the numerator or the denominator of \(\hat{\theta}\) may be zero, the moments of \(\hat{\theta}\) and \(\log\hat{\theta}\) do not actually exist. If the estimate is modified by adding \(1/2\) to each \(n_{ij}\), we have
\(\tilde{\theta}=\dfrac{(n_{11}+0.5)(n_{22}+0.5)}{(n_{12}+0.5)(n_{21}+0.5)}\)
with estimated variance
\(\hat{V}(\log\tilde{\theta})=\sum\limits_{i,j} \dfrac{1}{(n_{ij}+0.5)}\)
In smaller samples, \(\log\tilde{\theta}\) may be slightly less biased than \(\log\hat{\theta}\).