The tests discussed so far that use the chisquare approximation, including the Pearson and LRT for nominal data as well as the MantelHaenszel test for ordinal data, perform well when the contingency tables have a reasonable number of observations in each cell, as already discussed in Lesson 1. When samples are small, the distributions of \(X^2\), \(G^2\), and \(M^2\)^{ }(and other largesample based statistics) are not well approximated by the chisquared distribution; thus their \(p\)values are not to be trusted. In such situations, we can perform inference using an exact distribution (or estimates of exact distributions), but we should keep in mind that \(p\)values based on exact tests can be conservative (i.e, measured to be larger than they really are).
We may use an exact test if:
 the row totals \(n_{i+}\) and the column totals \(n_{+j}\) are both fixed by design of the study.
 we have a small sample size \(n\),
 more than 20% of cells have expected cell counts less than 5, and no expected cell count is less than 1.
Example: Lady tea tasting Section
Here we consider the famous tea tasting example! In a summer teapart in Cambridge, England, a lady claimed to be able to discern, by taste alone, whether a cup of tea with milk had the tea poured first or the milk poured first. An experiment was performed by Sir R.A. Fisher himself, then and there, to see if her claim was valid. Eight cups of tea were prepared and presented to her in random order. Four had the milk poured first, and other four had the tea poured first. The lady tasted each one and rendered her opinion. The results are summarized in the following \(2 \times 2\) table:
Actually poured first  Lady says poured first  

tea  milk  
tea  3  1 
milk  1  3 
The row totals are fixed by the experimenter. The column totals are fixed by the lady, who knows that four of the cups are "tea first" and four are "milk first." Under \(H_0\), the lady has no discerning ability, which is to say the four cups she calls "tea first" are a random sample from the eight. If she selects four at random, the probability that three of these four are actually "tea first" comes from the hypergeometric distribution, \(P(n_{11}=3)\):
\(\dfrac{\dbinom{4}{3}\dbinom{4}{1}}{\dbinom{8}{4}}=\dfrac{\dfrac{4!}{3!1!}\dfrac{4!}{1!3!}}{\dfrac{8!}{4!4!}}=\dfrac{16}{70}=0.229\)
A \(p\)value is the probability of getting a result as extreme or more extreme than the event actually observed, assuming \(H_0\) is true. In this example, the \(p\)value would be \(P(n_{11}\ge t_0)\), where \(t_0\) is the observed value of \(n_{11}\), which in this case is 3. The only result more extreme would be the lady's (correct) selection of all four the cups that are truly "tea first," which has probability
\(\dfrac{\dbinom{4}{4}\dbinom{4}{0}}{\dbinom{8}{4}}=\dfrac{1}{70}=0.014\)
As it turns out, the \(p\)value is \(.229 + .014 = .243\), which is only weak evidence against the null. In other words, there is not enough evidence to reject the null hypothesis that the lady is just purely guessing. To be fair, experiments with small amounts of data are generally not very powerful, to begin with, given the limited information.
Here is how we can do this computation in SAS and R. Further below we describe in a bit more detail the underlying idea behind these calculations.
Code
/*
 Example: Fisher's Tea Lady
*/
data tea;
input poured $ lady $ count;
datalines;
tea tea 3
tea milk 1
milk tea 1
milk milk 3
;
run;
proc freq data=tea order=data;
weight count;
tables poured*lady/ chisq relrisk riskdiff expected;
exact fisher chisq or;
run;
Output
The FREQ Procedure


Statistics for Table of poured by lady
Statistic  DF  Value  Prob 

WARNING: 100% of the cells have expected counts less than 5. (Asymptotic) ChiSquare may not be a valid test. 

ChiSquare  1  2.0000  0.1573 
Likelihood Ratio ChiSquare  1  2.0930  0.1480 
Continuity Adj. ChiSquare  1  0.5000  0.4795 
MantelHaenszel ChiSquare  1  1.7500  0.1859 
Phi Coefficient  0.5000  
Contingency Coefficient  0.4472  
Cramer's V  0.5000 
Pearson ChiSquare Test  

ChiSquare  2.0000 
DF  1 
Asymptotic Pr > ChiSq  0.1573 
Exact Pr >= ChiSq  0.4857 
Likelihood Ratio ChiSquare Test  

ChiSquare  2.0930 
DF  1 
Asymptotic Pr > ChiSq  0.1480 
Exact Pr >= ChiSq  0.4857 
MantelHaenszel ChiSquare Test  

ChiSquare  1.7500 
DF  1 
Asymptotic Pr > ChiSq  0.1859 
Exact Pr >= ChiSq  0.4857 
Fisher's Exact Test  

Cell (1,1) Frequency (F)  3 
Leftsided Pr <= F  0.9857 
Rightsided Pr >= F  0.2429 
Table Probability (P)  0.2286 
Twosided Pr <= P  0.4857 
Column 1 Risk Estimates  

Risk  ASE  95% Confidence Limits 
Exact 95% Confidence Limits 

Difference is (Row 1  Row 2)  
Row 1  0.7500  0.2165  0.3257  1.0000  0.1941  0.9937 
Row 2  0.2500  0.2165  0.0000  0.6743  0.0063  0.8059 
Total  0.5000  0.1768  0.1535  0.8465  0.1570  0.8430 
Difference  0.5000  0.3062  0.1001  1.0000 
Column 2 Risk Estimates  

Risk  ASE  95% Confidence Limits 
Exact 95% Confidence Limits 

Difference is (Row 1  Row 2)  
Row 1  0.2500  0.2165  0.0000  0.6743  0.0063  0.8059 
Row 2  0.7500  0.2165  0.3257  1.0000  0.1941  0.9937 
Total  0.5000  0.1768  0.1535  0.8465  0.1570  0.8430 
Difference  0.5000  0.3062  1.0000  0.1001 
Odds Ratio and Relative Risks  

Statistic  Value  95% Confidence Limits  
Odds Ratio  9.0000  0.3666  220.9270 
Relative Risk (Column 1)  3.0000  0.5013  17.9539 
Relative Risk (Column 2)  0.3333  0.0557  1.9949 
Odds Ratio  

Odds Ratio  9.0000 
Asymptotic Conf Limits  
95% Lower Conf Limit  0.3666 
95% Upper Conf Limit  220.9270 
Exact Conf Limits  
95% Lower Conf Limit  0.2117 
95% Upper Conf Limit  626.2435 
Sample Size = 8
OPTION EXACT in SAS indicates that we are doing exact tests which consider ALL tables with the exact same margins as the observed table. This option will work for any \(I \times J\) table. OPTION FISHER, more specifically performs Fisher's exact test which is an exact test only for a \(2 \times 2\) table in SAS.
For R, see TeaLady.R where you can see we used the fisher.test() function to perform Fisher's exact test for the \(2 \times 2\) table in question.
#### onesided Fisher's exact test
fisher.test(tea, alternative = "greater")
#### twosided Fisher's exact test
fisher.test(tea)
The same could be done using chisq.test() with option, simulate.p.value=TRUE. By reading the help file on fisher. test() function, you will see that certain options in this function only work for \(2 \times 2\) tables. For the output, see TeaLady.out
The basic idea behind exact tests is that they enumerate all possible tables that have the same margins, e.g., row sums and column sums. Then to compute the relevant statistics, e.g., \(X^2\), \(G^2\), oddsratios, we look for all tables where the values are more extreme than the one we have observed. The key here is that in the set of tables with the same margins, once we know the value in one cell, we know the rest of the cells. Therefore, to find a probability of observing a table, we need to find the probability of only one cell in the table (rather than the probabilities of four cells). Typically we use the value of cell (1,1).
Under the null hypothesis of independence, more specifically when oddsratio \(\theta = 1\), the probability distribution of that one cell \(n_{11}\) is hypergeometric, as discussed in the Tea lady example.
In the Lady tasting tea example, there are 5 possible \(2 \times 2\) tables that have the same observed margins. Can you figure out which are those? Stop and Think!
Extension of Fisher's test Section
For problems where the number of possible tables is too large, Monte Carlo methods are used to approximate "exact" statistics (e.g., option MC in SAS FREQ EXACT and in R under chisq.test() you need to specify simulate.p.value = TRUE and indicate how many runs you want MC simulation to do; for more see the help files). This extension, and thus these options in SAS and R, of the Fisher's exact test for a \(2 \times 2\) table, in effect, takes samples from a large number of possibilities in order to simulate the exact test.
This test is "exact" because no largesample approximations are used. The \(p\)value is valid regardless of the sample size. Asymptotic results may be unreliable when the distribution of the data is sparse, or skewed. Exact computations are based on the statistical theory of exact conditional inference for contingency tables.
Fisher's exact test is definitely appropriate when the row totals and column totals are both fixed by design. Some have argued that it may also be used when only one set of margins is truly fixed. This idea arises because the marginal totals \(n_{1+}, n_{+1}\) provide little information about the odds ratio \(\theta\).
Exact nonnull inference for \(\theta\) Section
When \(\theta = 1\), this distribution is hypergeometric, which we used in Fisher's exact test. More generally, Fisher (1935) gave this distribution for any value of \(\theta\). Using this distribution, it is easy to compute Fisher's exact \(p\)value for testing the null hypothesis \(H_0:\theta=\theta^*\) for any \(\theta^*\). The set of all values \(\theta^*\) that cannot be rejected at the \(\alpha=.05\) level test forms an exact 95% confidence region for \(\theta\).
Let's look at a part of the SAS output a bit closer, we get the same CIs in the R ouput. First, notice the sample estimate of the odds ratio equal to 9, which we can compute from the crossproduct ratio as we have discussed earlier. Note also that fisher.test() in R for \(2 \times 2\) tables will give socalled "conditional estimate" of the oddsratio so the value will be different (in this case, approximately 6.408).
Notice the difference between the exact and asymptotic CIs for the odds ratio for the twosided alternative (e.g., \(\theta\ne1\)). The exact version is larger. Recalling the interpretation of the odds ratio, what do these CIs tell us about true unknown oddsratio? This is a simple example of how inference may vary if you have small samples or sparseness.
Odds Ratio  

Odds Ratio  9.0000 
Asymptotic Conf Limits  
95% Lower Conf Limit  0.3666 
95% Upper Conf Limit  220.9270 
Exact Conf Limits  
95% Lower Conf Limit  0.2117 
95% Upper Conf Limit  626.2435 
Sample Size = 8
Bias correction for estimating \(\theta\) Section
Earlier, we learned that the natural estimate of \(\theta\) is
\(\hat{\theta}=\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\)
and that \(\log\hat{\theta}\) is approximately normally distributed about \(\log \theta\) with estimated variance
\(\hat{V}(\log\hat{\theta})=\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}\)
Advanced note: this is the estimated variance of the limiting distribution, not an estimate of the variance of \(\log\hat{\theta}\) itself. Because there is a nonzero probability that the numerator or the denominator of \(\hat{\theta}\) may be zero, the moments of \(\hat{\theta}\) and \(\log\hat{\theta}\) do not actually exist. If the estimate is modified by adding \(1/2\) to each \(n_{ij}\), we have
\(\tilde{\theta}=\dfrac{(n_{11}+0.5)(n_{22}+0.5)}{(n_{12}+0.5)(n_{21}+0.5)}\)
with estimated variance
\(\hat{V}(\log\tilde{\theta})=\sum\limits_{i,j} \dfrac{1}{(n_{ij}+0.5)}\)
In smaller samples, \(\log\tilde{\theta}\) may be slightly less biased than \(\log\hat{\theta}\).