3.3 - Two-way Tables - Exact Tests

Printer-friendly versionPrinter-friendly version

Fisher's Exact Test

Both Pearson's chi-square and Likelihood Ratio chi-square statistics perform well when the contingency tables have reasonable number of observations in each cell, as already discussed in Lesson 1. When samples are small, the distributions of X2 and G2 (and other large-sample based statistics) are not well approximated by the chi-squared distribution; thus the p-values for the hypothesis tests are not to be trusted. In such situation we can perform inference using exact distributions (or estimates of exact distributions), but keep in mind that p-values based on exact tests can be conservative, i.e, larger than they really are.

We may use exact tests if:

  • the row totals ni+ and the column totals n+j are both fixed by design of the study; this happens only rarely, 
  • we have a small sample size n,
  • more than 20% of cells have expected cell counts less than 5, and no expected cell count is  are less than 1.

Example - Lady tea tasting (Sec. 3.5. Agresti (2013) or 2.6.2 Agresti (2007)):

tea cupConsider the Fisher's exact test with the "famous" tea tasting example! In a summer tea-part in Cambridge, England, a lady claimed to be able to discern, by taste alone, whether a cup of tea with milk had the tea poured first or the milk poured first. An experiment was performed by Sir R.A. Fisher himself, then and there, to see if her claim is valid. Eight cups of tea are prepared and presented to her in random order. Four had the milk poured first, and four had the tea poured first. The lady tasted each one and rendered her opinion. The results are summarized in a 2 × 2 table:

Poured first
Lady says
tea first
milk first
tea
3
1
milk
1
3

The row totals are fixed by the experimenter. The column totals are fixed by the lady, who knows that four of the cups are "tea first" and four are "milk first."

Under H0: "the lady has no discerning ability," i.e. the four cups she calls "tea first" are a random sample from the eight.

If she selects four at random, the probability that three of these four are actually "tea first" comes from the hypergeometric distribution, P(n11=3):

\(\dfrac{\dbinom{4}{3}\dbinom{4}{1}}{\dbinom{8}{4}}=\dfrac{\dfrac{4!}{3!1!}\dfrac{4!}{1!3!}}{\dfrac{8!}{4!4!}}=\dfrac{16}{70}=0.229\)

A p-value is the probability of getting a result as extreme or more extreme than the event you actually did observe, if H0 is true. In this example, the p-value=P(n11 t0), where t0 is the observed value of n11, in this case 3.

The only result more extreme is that the woman selects all four of the cups that are truly "tea first," which has probability

\(\dfrac{\dbinom{4}{4}\dbinom{4}{0}}{\dbinom{8}{4}}=\dfrac{1}{70}=0.014\)

The p-value is .229 + .014 = .243, which is only weak evidence against the null. In other words there is not enough evidence to reject our hypothesis that the lady is just purely guessing.

_____________________________________________________________

Here is how we can do this computation in SAS and R. Further below we describe in a bit more detail the underlying idea behind these calculations.

SAS logoFor SAS, see TeaLady.sas (also check Table A.2. pg. 635 from Agresti (2013)), and TeaLady.lst.

SAS program Lec11exLady.sas

(Source: https://www.stat.ufl.edu/~aa/cda/sas/sas.html )

OPTION EXACT in SAS indicates that we are doing exact tests which consider ALL tables with the exact same margins as the observed table. This option will work for any I × J table. OPTION FISHER, more specifically performs Fisher's exact test which is an exact test only for a 2 × 2 table in SAS.

R logoFor R, see TeaLady.R where you can see we used the fisher.test() function to perform Fisher's exact test for the 2 × 2 table in question.

tea lady R code

The same could be done using chisq.test() with option, simulate.p.value=TRUE. By reading the help file on fisher. test() function, you will see that certain options in this function only work for 2 × 2 tables. For the output, see TeaLady.out

The basic idea behind exact tests is that they enumerate all possible tables that have the same margins, e.g., row sums and column sums. Then to compute the relevant statistics, e.g., X2, G2, odds-ratios, you look for all tables where the values are more extreme than the one you have observed.

The key here is that in the set of tables with same margins, once you know the value in one cell, you know the rest of the cells. Therefore, to find a probability of observing a table, we need to find the probability of only one cell in the table (rather than the probabilities of four cells). Typically we use the value of cell (1,1).

Under null hypothesis of independence, more specifically odds-ratio θ = 1, the probability distribution of that one cell (e.g. n11) is hypergeometric, as discussed in the Tea lady example.

Note, however, that these tests can be conservative. Because of the discreteness of the data, and the small number of possible realizations, e.g., tables, the Type I error rate will be smaller than the intended one (e.g., α=0.05). In this example, P(Type I error)=0.014. To overcome this Agresti(2007, pg. 48) suggest the use of mid-p-value which is defined as half the probability of the observed result plus the more extreme result. For example, P(n11=3)/2 + P(n11=4) = 0.229/2 + 0.014 = 0.129; in this case our conclusions would not change at the α=0.05 level.

Discussion    In the Lady tasting tea example there are 5 possible 2 × 2 tables that have the same observed margins. Can you figure out which are those?

Extension of Fisher's test

For problems where the number of possible tables is too large, Monte Carlo methods are used to approximate "exact" statistics (e.g., option MC in SAS FREQ EXACT and in R under chisq.test() you need to specify simulate.p.value = TRUE and indicate how many runs you want MC simulation to do; for more see the help files). This extension, and thus these options in SAS and R, of the Fisher's exact test for a 2 × 2 table, in effect, takes samples from the large number of possibilities in order to simulate the exact test.

This test is "exact" because no large-sample approximations are used. The p-value is valid regardless of the sample size. Asymptotic results may be unreliable when the distribution of the data is sparse, or skewed. Exact computations are based on the statistical theory of exact conditional inference for contingency tables (refs: Agresti (2013), Bishop et al.(1975)).

Extensions of Fisher's exact test to more general I × J tables are more tedious to compute, and that's why we do not use them all the time, but they have been implemented in programs such as SAS, StatXact, and Splus/R. Nowadays, some researchers are also using tools from algebraic statistics relying on some tools from computational algebra to address some of those questions, (Diaconis and Sturmfels (1998)).

Fisher's exact test is definitely appropriate when the row totals and column totals are both fixed by design. Some have argued that it may also be used when only one set of margins is truly fixed. This idea arises because the marginal totals {n1+, n+1} provide little information about the odds ratio θ.

Exact non-null inference for θ

(Agresti (2013), Section 3.6, and Agresti (2007), Section 2.6.4)

When θ = 1, this distribution is hypergeometric, which we used in Fisher's exact test.  More generally, Fisher (1935) gave this distribution for any value of θ. Using this distribution, it is easy to compute Fisher's exact p-value for testing the null hypothesis H0 : θ = θ* for any θ*. The set of all values θ* that cannot be rejected in a level α = .05, two-tailed test forms an exact 95% confidence region for θ.

Let's look at a part of the SAS output a bit closer, TeaLady.lst, we get the same CIs in the R ouput, TeaLady.out. First, notice the sample estimate of the odds ratio equal to 9; you could compute this easily on your own by computing the cross-product ratio as we have discussed earlier. Important: fisher.test() in R for 2 × 2 tables will give you so called "conditional estimate" of the odds-ratio so the value will be different, in this case, approximately 6.408; if you want the unconditional estimate compute by hand or your own code.

Notice the difference between the exact and asymptotic CI's for the odds ratio for two-sided alternative, e.g., θ≠1. The exact is quite larger. Recall the interpretation of the odds ratio. What do these CIs tell us about true unknown odds-ratio? This is a simple example of how inference may vary if you have small sample or sparseness.

SAS output

 

Bias correction for estimating θ

In earlier lectures, we learned that the natural estimate of θ is

\(\hat{\theta}=\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\)

and that \(\text{log}\hat{\theta}\) is approximately normally distributed about log θ with estimated variance

\(\hat{V}(\text{log}\hat{\theta})=\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}\)

[Advanced concepts: Note that this is the estimated variance of the limiting distribution, not an estimate of the variance of  \(\text{log}\hat{\theta}\) itself. Because there is a nonzero probability that the numerator or the denominator of \(\hat{\theta}\) may be zero, the moments of \(\hat{\theta}\) and \(\text{log}\hat{\theta}\) do not actually exist.]

In Section 3.1.1, Agresti (2013) suggests a modified estimate which comes from adding 1/2 to each nij ,

\(\tilde{\theta}=\dfrac{(n_{11}+0.5)(n_{22}+0.5)}{(n_{12}+0.5)(n_{21}+0.5)}\)

with estimated variance

\(\hat{V}(\text{log}\tilde{\theta})=\sum\limits_{i,j} \dfrac{1}{(n_{ij}+0.5)}\)

In smaller samples, \(\text{log}\tilde{\theta}\) may be slightly less biased than \(\text{log}\hat{\theta}\).

 


More Advanced Concepts

The use of  \(\tilde{\theta}\) can also be motivated on Bayesian grounds. Adding constants such as 1/2 to the cell frequencies can be interpreted as Bayesian inference under a particular kind of prior distribution (e.g., Dirichlet). For discussion of Bayesian inference under a Dirichlet prior, see Agresti (2013) Section 3.6, or the Bayesian Inference for Categorical Data Analysis by A. Agresti.

The practical effect of adding a constant such as 1/2 is to smooth the estimated cell probabilities toward a uniform table, where all elements of π are equal. In a large, sparse table, adding 1/2 to each cell frequency could lead to over-smoothing, because the total number of hypothetical prior observations being added (1/2 times the number of cells) could be nearly as large or even larger than the actual sample size n . If there are zeros in the cell, you can run the analysis by adding an extremely small value to those cells, e.g., 0.0000001 or 0.00000001, etc...

For more on discussion of sparse tables and less add hoc methods, see Agresti(2007), Section 5.3, and Agresti (2013) Section 10.6. We will also see more on sparse tables later.