3.3  Twoway Tables  Exact Tests
Fisher's Exact Test
Both Pearson's chisquare and Likelihood Ratio chisquare statistics perform well when the contingency tables have reasonable number of observations in each cell, as already discussed in Lesson 1. When samples are small, the distributions of X^{2} and G^{2 }(and other largesample based statistics) are not well approximated by the chisquared distribution; thus the pvalues for the hypothesis tests are not to be trusted. In such situation we can perform inference using exact distributions (or estimates of exact distributions), but keep in mind that pvalues based on exact tests can be conservative, i.e, larger than they really are.
We may use exact tests if:
 the row totals n_{i+} and the column totals n_{+j} are both fixed by design of the study; this happens only rarely,
 we have a small sample size n,
 more than 20% of cells have expected cell counts less than 5, and no expected cell count is are less than 1.
Example  Lady tea tasting (Sec. 3.5. Agresti (2013) or 2.6.2 Agresti (2007)):
Consider the Fisher's exact test with the "famous" tea tasting example! In a summer teapart in Cambridge, England, a lady claimed to be able to discern, by taste alone, whether a cup of tea with milk had the tea poured first or the milk poured first. An experiment was performed by Sir R.A. Fisher himself, then and there, to see if her claim is valid. Eight cups of tea are prepared and presented to her in random order. Four had the milk poured first, and four had the tea poured first. The lady tasted each one and rendered her opinion. The results are summarized in a 2 × 2 table:
Poured first

Lady says


tea first

milk first


tea

3

1

milk

1

3

The row totals are fixed by the experimenter. The column totals are fixed by the lady, who knows that four of the cups are "tea first" and four are "milk first."
Under H_{0}: "the lady has no discerning ability," i.e. the four cups she calls "tea first" are a random sample from the eight.
If she selects four at random, the probability that three of these four are actually "tea first" comes from the hypergeometric distribution, P(n_{11}=3):
\(\dfrac{\dbinom{4}{3}\dbinom{4}{1}}{\dbinom{8}{4}}=\dfrac{\dfrac{4!}{3!1!}\dfrac{4!}{1!3!}}{\dfrac{8!}{4!4!}}=\dfrac{16}{70}=0.229\)
A pvalue is the probability of getting a result as extreme or more extreme than the event you actually did observe, if H_{0} is true. In this example, the pvalue=P(n_{11} ≥ t_{0}), where t_{0} is the observed value of n_{11}, in this case 3.
The only result more extreme is that the woman selects all four of the cups that are truly "tea first," which has probability
\(\dfrac{\dbinom{4}{4}\dbinom{4}{0}}{\dbinom{8}{4}}=\dfrac{1}{70}=0.014\)
The pvalue is .229 + .014 = .243, which is only weak evidence against the null. In other words there is not enough evidence to reject our hypothesis that the lady is just purely guessing.
_____________________________________________________________
Here is how we can do this computation in SAS and R. Further below we describe in a bit more detail the underlying idea behind these calculations.
For SAS, see TeaLady.sas (also check Table A.2. pg. 635 from Agresti (2013)), and TeaLady.lst.
(Source: https://www.stat.ufl.edu/~aa/cda/sas/sas.html )
OPTION EXACT in SAS indicates that we are doing exact tests which consider ALL tables with the exact same margins as the observed table. This option will work for any I × J table. OPTION FISHER, more specifically performs Fisher's exact test which is an exact test only for a 2 × 2 table in SAS.
For R, see TeaLady.R where you can see we used the fisher.test() function to perform Fisher's exact test for the 2 × 2 table in question.
The same could be done using chisq.test() with option, simulate.p.value=TRUE. By reading the help file on fisher. test() function, you will see that certain options in this function only work for 2 × 2 tables. For the output, see TeaLady.out
The basic idea behind exact tests is that they enumerate all possible tables that have the same margins, e.g., row sums and column sums. Then to compute the relevant statistics, e.g., X^{2}, G^{2}, oddsratios, you look for all tables where the values are more extreme than the one you have observed.
The key here is that in the set of tables with same margins, once you know the value in one cell, you know the rest of the cells. Therefore, to find a probability of observing a table, we need to find the probability of only one cell in the table (rather than the probabilities of four cells). Typically we use the value of cell (1,1).
Under null hypothesis of independence, more specifically oddsratio θ = 1, the probability distribution of that one cell (e.g. n_{11}) is hypergeometric, as discussed in the Tea lady example.
Note, however, that these tests can be conservative. Because of the discreteness of the data, and the small number of possible realizations, e.g., tables, the Type I error rate will be smaller than the intended one (e.g., α=0.05). In this example, P(Type I error)=0.014. To overcome this Agresti(2007, pg. 48) suggest the use of midpvalue which is defined as half the probability of the observed result plus the more extreme result. For example, P(n_{11}=3)/2 + P(n_{11}=4) = 0.229/2 + 0.014 = 0.129; in this case our conclusions would not change at the α=0.05 level.
In the Lady tasting tea example there are 5 possible 2 × 2 tables that have the same observed margins. Can you figure out which are those? 
Extension of Fisher's test
For problems where the number of possible tables is too large, Monte Carlo methods are used to approximate "exact" statistics (e.g., option MC in SAS FREQ EXACT and in R under chisq.test() you need to specify simulate.p.value = TRUE and indicate how many runs you want MC simulation to do; for more see the help files). This extension, and thus these options in SAS and R, of the Fisher's exact test for a 2 × 2 table, in effect, takes samples from the large number of possibilities in order to simulate the exact test.
This test is "exact" because no largesample approximations are used. The pvalue is valid regardless of the sample size. Asymptotic results may be unreliable when the distribution of the data is sparse, or skewed. Exact computations are based on the statistical theory of exact conditional inference for contingency tables (refs: Agresti (2013), Bishop et al.(1975)).
Extensions of Fisher's exact test to more general I × J tables are more tedious to compute, and that's why we do not use them all the time, but they have been implemented in programs such as SAS, StatXact, and Splus/R. Nowadays, some researchers are also using tools from algebraic statistics relying on some tools from computational algebra to address some of those questions, (Diaconis and Sturmfels (1998)).
Fisher's exact test is definitely appropriate when the row totals and column totals are both fixed by design. Some have argued that it may also be used when only one set of margins is truly fixed. This idea arises because the marginal totals {n_{1+}, n_{+1}} provide little information about the odds ratio θ.
Exact nonnull inference for θ
(Agresti (2013), Section 3.6, and Agresti (2007), Section 2.6.4)
When θ = 1, this distribution is hypergeometric, which we used in Fisher's exact test. More generally, Fisher (1935) gave this distribution for any value of θ. Using this distribution, it is easy to compute Fisher's exact pvalue for testing the null hypothesis H_{0} : θ = θ^{*} for any θ^{*}. The set of all values θ^{*} that cannot be rejected in a level α = .05, twotailed test forms an exact 95% confidence region for θ.
Let's look at a part of the SAS output a bit closer, TeaLady.lst, we get the same CIs in the R ouput, TeaLady.out. First, notice the sample estimate of the odds ratio equal to 9; you could compute this easily on your own by computing the crossproduct ratio as we have discussed earlier. Important: fisher.test() in R for 2 × 2 tables will give you so called "conditional estimate" of the oddsratio so the value will be different, in this case, approximately 6.408; if you want the unconditional estimate compute by hand or your own code.
Notice the difference between the exact and asymptotic CI's for the odds ratio for twosided alternative, e.g., θ≠1. The exact is quite larger. Recall the interpretation of the odds ratio. What do these CIs tell us about true unknown oddsratio? This is a simple example of how inference may vary if you have small sample or sparseness.
Bias correction for estimating θ
In earlier lectures, we learned that the natural estimate of θ is
\(\hat{\theta}=\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\)
and that \(\text{log}\hat{\theta}\) is approximately normally distributed about log θ with estimated variance
\(\hat{V}(\text{log}\hat{\theta})=\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}\)
[Advanced concepts: Note that this is the estimated variance of the limiting distribution, not an estimate of the variance of \(\text{log}\hat{\theta}\) itself. Because there is a nonzero probability that the numerator or the denominator of \(\hat{\theta}\) may be zero, the moments of \(\hat{\theta}\) and \(\text{log}\hat{\theta}\) do not actually exist.]
In Section 3.1.1, Agresti (2013) suggests a modified estimate which comes from adding 1/2 to each n_{ij} ,
\(\tilde{\theta}=\dfrac{(n_{11}+0.5)(n_{22}+0.5)}{(n_{12}+0.5)(n_{21}+0.5)}\)
with estimated variance
\(\hat{V}(\text{log}\tilde{\theta})=\sum\limits_{i,j} \dfrac{1}{(n_{ij}+0.5)}\)
In smaller samples, \(\text{log}\tilde{\theta}\) may be slightly less biased than \(\text{log}\hat{\theta}\).
More Advanced Concepts
The use of \(\tilde{\theta}\) can also be motivated on Bayesian grounds. Adding constants such as 1/2 to the cell frequencies can be interpreted as Bayesian inference under a particular kind of prior distribution (e.g., Dirichlet). For discussion of Bayesian inference under a Dirichlet prior, see Agresti (2013) Section 3.6, or the Bayesian Inference for Categorical Data Analysis by A. Agresti.
The practical effect of adding a constant such as 1/2 is to smooth the estimated cell probabilities toward a uniform table, where all elements of π are equal. In a large, sparse table, adding 1/2 to each cell frequency could lead to oversmoothing, because the total number of hypothetical prior observations being added (1/2 times the number of cells) could be nearly as large or even larger than the actual sample size n . If there are zeros in the cell, you can run the analysis by adding an extremely small value to those cells, e.g., 0.0000001 or 0.00000001, etc...
For more on discussion of sparse tables and less add hoc methods, see Agresti(2007), Section 5.3, and Agresti (2013) Section 10.6. We will also see more on sparse tables later.