4.5 - Fisher's Exact Test

The tests discussed so far that use the chi-square approximation, including the Pearson and LRT for nominal data as well as the Mantel-Haenszel test for ordinal data, perform well when the contingency tables have a reasonable number of observations in each cell, as already discussed in Lesson 1. When samples are small, the distributions of \(X^2\), \(G^2\), and \(M^2\)(and other large-sample based statistics) are not well approximated by the chi-squared distribution; thus their \(p\)-values are not to be trusted. In such situations, we can perform inference using an exact distribution (or estimates of exact distributions), but we should keep in mind that \(p\)-values based on exact tests can be conservative (i.e, measured to be larger than they really are).

We may use an exact test if:

the row totals \(n_{i+}\) and the column totals \(n_{+j}\) are both fixed by design of the study.
we have a small sample size \(n\),
more than 20% of cells have expected cell counts less than 5, and no expected cell count is less than 1.

Example: Lady tea tasting

Here we consider the famous tea tasting example! In a summer tea-part in Cambridge, England, a lady claimed to be able to discern, by taste alone, whether a cup of tea with milk had the tea poured first or the milk poured first. An experiment was performed by Sir R.A. Fisher himself, then and there, to see if her claim was valid. Eight cups of tea were prepared and presented to her in random order. Four had the milk poured first, and other four had the tea poured first. The lady tasted each one and rendered her opinion. The results are summarized in the following \(2 \times 2\) table:

Actually poured first	Lady says poured first
Actually poured first	tea	milk
tea	3	1
milk	1	3

The row totals are fixed by the experimenter. The column totals are fixed by the lady, who knows that four of the cups are "tea first" and four are "milk first." Under \(H_0\), the lady has no discerning ability, which is to say the four cups she calls "tea first" are a random sample from the eight. If she selects four at random, the probability that three of these four are actually "tea first" comes from the hypergeometric distribution, \(P(n_{11}=3)\):

\(\dfrac{\dbinom{4}{3}\dbinom{4}{1}}{\dbinom{8}{4}}=\dfrac{\dfrac{4!}{3!1!}\dfrac{4!}{1!3!}}{\dfrac{8!}{4!4!}}=\dfrac{16}{70}=0.229\)

A \(p\)-value is the probability of getting a result as extreme or more extreme than the event actually observed, assuming \(H_0\) is true. In this example, the \(p\)-value would be \(P(n_{11}\ge t_0)\), where \(t_0\) is the observed value of \(n_{11}\), which in this case is 3. The only result more extreme would be the lady's (correct) selection of all four the cups that are truly "tea first," which has probability

\(\dfrac{\dbinom{4}{4}\dbinom{4}{0}}{\dbinom{8}{4}}=\dfrac{1}{70}=0.014\)

As it turns out, the \(p\)-value is \(.229 + .014 = .243\), which is only weak evidence against the null. In other words, there is not enough evidence to reject the null hypothesis that the lady is just purely guessing. To be fair, experiments with small amounts of data are generally not very powerful, to begin with, given the limited information.

Here is how we can do this computation in SAS and R. Further below we describe in a bit more detail the underlying idea behind these calculations.

Code

/*----------------------------
| Example: Fisher's Tea Lady
-----------------------------*/
data tea;
input poured $ lady $ count;
datalines;
tea tea 3
tea milk 1
milk tea 1
milk milk 3
;
run;
proc freq data=tea order=data;
weight count;
tables poured*lady/ chisq relrisk riskdiff expected;
exact fisher chisq or;
run;

Output

The SAS System

The FREQ Procedure

Frequency Expected Percent Row Pct Col Pct


Table of poured by lady
poured	lady
poured	tea	milk	Total
tea	3 2 37.50 75.00 75.00	1 2 12.50 25.00 25.00	4 50.00
milk	1 2 12.50 25.00 25.00	3 2 37.50 75.00 75.00	4 50.00
Total	4 50.00	4 50.00	8 100.00

Statistics for Table of poured by lady


Statistic	DF	Value	Prob
WARNING: 100% of the cells have expected counts less than 5. (Asymptotic) Chi-Square may not be a valid test.
Chi-Square	1	2.0000	0.1573
Likelihood Ratio Chi-Square	1	2.0930	0.1480
Continuity Adj. Chi-Square	1	0.5000	0.4795
Mantel-Haenszel Chi-Square	1	1.7500	0.1859
Phi Coefficient		0.5000
Contingency Coefficient		0.4472
Cramer's V		0.5000


Pearson Chi-Square Test
Chi-Square	2.0000
DF	1
Asymptotic Pr > ChiSq	0.1573
Exact Pr >= ChiSq	0.4857


Likelihood Ratio Chi-Square Test
Chi-Square	2.0930
DF	1
Asymptotic Pr > ChiSq	0.1480
Exact Pr >= ChiSq	0.4857


Mantel-Haenszel Chi-Square Test
Chi-Square	1.7500
DF	1
Asymptotic Pr > ChiSq	0.1859
Exact Pr >= ChiSq	0.4857


Fisher's Exact Test
Cell (1,1) Frequency (F)	3
Left-sided Pr <= F	0.9857
Right-sided Pr >= F	0.2429

Table Probability (P)	0.2286
Two-sided Pr <= P	0.4857


Column 1 Risk Estimates
	Risk	ASE	95% Confidence Limits		Exact 95% Confidence Limits
Difference is (Row 1 - Row 2)
Row 1	0.7500	0.2165	0.3257	1.0000	0.1941	0.9937
Row 2	0.2500	0.2165	0.0000	0.6743	0.0063	0.8059
Total	0.5000	0.1768	0.1535	0.8465	0.1570	0.8430
Difference	0.5000	0.3062	-0.1001	1.0000


Column 2 Risk Estimates
	Risk	ASE	95% Confidence Limits		Exact 95% Confidence Limits
Difference is (Row 1 - Row 2)
Row 1	0.2500	0.2165	0.0000	0.6743	0.0063	0.8059
Row 2	0.7500	0.2165	0.3257	1.0000	0.1941	0.9937
Total	0.5000	0.1768	0.1535	0.8465	0.1570	0.8430
Difference	-0.5000	0.3062	-1.0000	0.1001


Odds Ratio and Relative Risks
Statistic	Value	95% Confidence Limits
Odds Ratio	9.0000	0.3666	220.9270
Relative Risk (Column 1)	3.0000	0.5013	17.9539
Relative Risk (Column 2)	0.3333	0.0557	1.9949


Odds Ratio
Odds Ratio	9.0000

Asymptotic Conf Limits
95% Lower Conf Limit	0.3666
95% Upper Conf Limit	220.9270

Exact Conf Limits
95% Lower Conf Limit	0.2117
95% Upper Conf Limit	626.2435

Sample Size = 8

OPTION EXACT in SAS indicates that we are doing exact tests which consider ALL tables with the exact same margins as the observed table. This option will work for any \(I \times J\) table. OPTION FISHER, more specifically performs Fisher's exact test which is an exact test only for a \(2 \times 2\) table in SAS.

For R, see TeaLady.R where you can see we used the fisher.test() function to perform Fisher's exact test for the \(2 \times 2\) table in question.

#### one-sided Fisher's exact test

fisher.test(tea, alternative = "greater")

#### two-sided Fisher's exact test

fisher.test(tea)

The same could be done using chisq.test() with option, simulate.p.value=TRUE. By reading the help file on fisher. test() function, you will see that certain options in this function only work for \(2 \times 2\) tables. For the output, see TeaLady.out

The basic idea behind exact tests is that they enumerate all possible tables that have the same margins, e.g., row sums and column sums. Then to compute the relevant statistics, e.g., \(X^2\), \(G^2\), odds-ratios, we look for all tables where the values are more extreme than the one we have observed. The key here is that in the set of tables with the same margins, once we know the value in one cell, we know the rest of the cells. Therefore, to find a probability of observing a table, we need to find the probability of only one cell in the table (rather than the probabilities of four cells). Typically we use the value of cell (1,1).

Under the null hypothesis of independence, more specifically when odds-ratio \(\theta = 1\), the probability distribution of that one cell \(n_{11}\) is hypergeometric, as discussed in the Tea lady example.

Stop and Think!

In the Lady tasting tea example, there are 5 possible \(2 \times 2\) tables that have the same observed margins. Can you figure out which are those?

Extension of Fisher's test

For problems where the number of possible tables is too large, Monte Carlo methods are used to approximate "exact" statistics (e.g., option MC in SAS FREQ EXACT and in R under chisq.test() you need to specify simulate.p.value = TRUE and indicate how many runs you want MC simulation to do; for more see the help files). This extension, and thus these options in SAS and R, of the Fisher's exact test for a \(2 \times 2\) table, in effect, takes samples from a large number of possibilities in order to simulate the exact test.

This test is "exact" because no large-sample approximations are used. The \(p\)-value is valid regardless of the sample size. Asymptotic results may be unreliable when the distribution of the data is sparse, or skewed. Exact computations are based on the statistical theory of exact conditional inference for contingency tables.

Fisher's exact test is definitely appropriate when the row totals and column totals are both fixed by design. Some have argued that it may also be used when only one set of margins is truly fixed. This idea arises because the marginal totals \(n_{1+}, n_{+1}\) provide little information about the odds ratio \(\theta\).

Exact non-null inference for \(\theta\)

When \(\theta = 1\), this distribution is hypergeometric, which we used in Fisher's exact test. More generally, Fisher (1935) gave this distribution for any value of \(\theta\). Using this distribution, it is easy to compute Fisher's exact \(p\)-value for testing the null hypothesis \(H_0:\theta=\theta^*\) for any \(\theta^*\). The set of all values \(\theta^*\) that cannot be rejected at the \(\alpha=.05\) level test forms an exact 95% confidence region for \(\theta\).

Let's look at a part of the SAS output a bit closer, we get the same CIs in the R ouput. First, notice the sample estimate of the odds ratio equal to 9, which we can compute from the cross-product ratio as we have discussed earlier. Note also that fisher.test() in R for \(2 \times 2\) tables will give so-called "conditional estimate" of the odds-ratio so the value will be different (in this case, approximately 6.408).

Notice the difference between the exact and asymptotic CIs for the odds ratio for the two-sided alternative (e.g., \(\theta\ne1\)). The exact version is larger. Recalling the interpretation of the odds ratio, what do these CIs tell us about true unknown odds-ratio? This is a simple example of how inference may vary if you have small samples or sparseness.


Odds Ratio
Odds Ratio	9.0000

Asymptotic Conf Limits
95% Lower Conf Limit	0.3666
95% Upper Conf Limit	220.9270

Exact Conf Limits
95% Lower Conf Limit	0.2117
95% Upper Conf Limit	626.2435

Sample Size = 8

Bias correction for estimating \(\theta\)

Earlier, we learned that the natural estimate of \(\theta\) is

\(\hat{\theta}=\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\)

and that \(\log\hat{\theta}\) is approximately normally distributed about \(\log \theta\) with estimated variance

\(\hat{V}(\log\hat{\theta})=\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}\)

Advanced note: this is the estimated variance of the limiting distribution, not an estimate of the variance of \(\log\hat{\theta}\) itself. Because there is a nonzero probability that the numerator or the denominator of \(\hat{\theta}\) may be zero, the moments of \(\hat{\theta}\) and \(\log\hat{\theta}\) do not actually exist. If the estimate is modified by adding \(1/2\) to each \(n_{ij}\), we have

\(\tilde{\theta}=\dfrac{(n_{11}+0.5)(n_{22}+0.5)}{(n_{12}+0.5)(n_{21}+0.5)}\)

with estimated variance

\(\hat{V}(\log\tilde{\theta})=\sum\limits_{i,j} \dfrac{1}{(n_{ij}+0.5)}\)

In smaller samples, \(\log\tilde{\theta}\) may be slightly less biased than \(\log\hat{\theta}\).

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility