Generating two-way tables of counts is similar to generating one-way tables of counts but with a higher degree of complexity. The main generating probability mechanisms are Poisson, Binomial, and Multinomial models, but for two-way tables, the margins play a big role. We will discuss the following sampling schemes:
- Unrestricted sampling (Poisson)
- Sampling with fixed total sample size (Multinomial)
- Sampling with fixed certain marginal totals (Product-Multinomial, Hypergeometric)
These sampling models extend to higher-dimensional tables as we will see in the later lessons.
Poisson Sampling Section
Think of collecting data until the sundown on the ski-slopes of a French ski-resort regarding whether the skiers have taken placebo or Vitamin C and whether they have a cold or not. As one skier is encountered, we ask him/her about these two variables of interest. Or, consider standing at the crossing of two major thoroughfares of a Midwest town and classifying each passing car whether it is driven by a male or a female and whether the car is an American make or not. In both examples, the total sample size is completely random and so are both margins of the table. Hence, each cell is considered an independent Poisson variable. The cell counts will follow a Poisson distribution
\(n_{ij}\sim Poisson(\lambda_{ij})\)
independently for \( i = 1, \ldots , I\) and \(j = 1, \ldots , J\). In this scheme, the overall \( n\) is not fixed. \(\lambda_{ij}\) is the parameter describing the rate of occurrence for the \((i, j)\)th cell. The expected mean and the variance of the cell are
\(E(n_{ij})=\lambda_{ij}\)
\(Var(n_{ij})=\lambda_{ij}\)
We can think of rewriting the two-way table as a one-way table; e.g. we can think of \(\lambda_{ij}\) as the \(\lambda_{i}\) we saw in one-way frequency tables.
Question: Are the rates in different cells the same or different for different cells?
It depends on the context of the problem and the model assumptions; we could assume the same underlying \(\lambda\) or different ones.
\(n\sim Poisson(\lambda_{++})\)
and a multinomial likelihood for \({n_{ij}}\) given \(n\) , with parameters
\(\pi_{ij}=\dfrac{\lambda_{ij}}{\lambda_{++}}\)
Here, the total \(n\) provides no information about \(\pi = \pi_{ij}\). From a likelihood standpoint, we get the same inferences about \(\pi\) whether \(n\) is regarded as fixed or random.
Multinomial Sampling Section
Consider now collecting data on a predetermined number of individuals (e.g., 279 in the Vitamin C example) and classifying them according to two binary variables (e.g., treatment and response). If we draw a sample of \(n\) subjects from a population and record \((Y, Z)\) for each subject, then the joint distribution of \({n_{ij}}\) is multinomial with index \(n\) and parameter \(\pi = \pi_{ij}\),
\(\pi_{ij}=P(Y=i,Z=j)\)
where the grand total \(n\) is fixed and known. Parameters are functions of the cell means:
\(\mu_{ij}=E(n_{ij})=n\pi_{ij}\)
Question: Think of rewriting a two-way table as a one-way frequency table. How would you do this for the Vitamin C example?
An extension of this case occurs when, instead of fixing the total \(n\), either row totals OR the column totals are assumed fixed, as we will see next.
Product Multinomial Sampling Section
Consider now collecting data on 140 "placebo" and 139 "vitamin c" individuals and classifying them according to the response (e.g., if they got the a cold or not). Here, data are collected on a predetermined number of individuals for each category of one variable, and both sets are classified according to the levels of the other variable of interest. Hence, one margin is fixed by design while the other is free to vary. This type of sampling is called Independent Multinomial Sampling. If the response variable has only two levels, it is also called Independent Binomial Sampling, which is a special case of independent multinomial sampling.
If we decide beforehand that we will draw \(n_{i+}\) subjects with characteristic \(Y = i (i = 1, \ldots , I)\) and record the \(Z\)-value for each one, each row of the table \((n_{i1}, n_{i2}, \ldots , n_{iJ})\) is then multinomial with probabilities \(\pi_{j|i} = \dfrac{\pi_{ij}}{\pi_{i+}}\), and the rows are independent. The full likelihood is obtained by taking the product of the individual multinomial PMFs and therefore is known as product-multinomial sampling scheme.
Viewing the data as product-multinomial is appropriate when the row totals truly are fixed by design, as in
- stratified random sampling (strata defined by \(Y\) )
- an experiment where \(Y =\) treatment group
It’s also appropriate when the row totals are not fixed, but we are interested in inference on \(P(Z | Y)\) and not \(P(Y)\). That is, when \(Z\) is the outcome of interest, and \(Y\) is an explanatory variable that we do not wish to model.
Suppose the data are multinomial. Then we may factor the likelihood into two parts:
- a multinomial likelihood for the row totals \((n_{1+}, n_{2+}, \ldots , n_{I+})\) with index \(n\) and parameter \({\pi_{i+}}\)
- independent multinomial likelihoods for the rows, \((n_{i1} , n_{i2} , \ldots , n_{iJ} )\) with parameters \({\pi_{j|i} = \dfrac{\pi_{ij}}{\pi_{i+}}}\).
Therefore, if the parameters of interest can be expressed as functions only of the \(\pi_{j|i}\)’s and not the \(\pi_{i+}\)’s, then correct likelihood-based inferences may be obtained by treating the data as if they were product-multinomial. Conversely, if the data are product-multinomial, then correct likelihood-based inferences about functions of the \(\pi_{j|i}\)s will be obtained if we analyze the data as if they were multinomial. We may also treat them as Poisson, ignoring any inferences about \(n_{++}\) or \(n_{i+}\).
Hypergeometric Sampling Section
We may encounter data where both the row totals \((n_{1+}, . . . , n_{I+})\) and the column totals \((n_{+1}, . . . , n_{+J} )\) are fixed by design. The best-known example of this is Fisher’s hypothetical example of the Lady Tasting Tea, which will be discussed in the section on Exact Tests. Even when both sets of marginal totals are not fixed by design, some statisticians like to condition on them and perform "exact" inference when the sample size is small and asymptotic approximations are unlikely to work well.
In a \(2\times2\) table, the resulting sampling distribution is hypergeometric, which we introduced in Lesson 1. Recall, that the hypergeometric distribution describes the probability that in a sample of \(n\) distinctive units drawn from a finite population of size \(N\) without replacement, there are \(k\) successes. Consider that you draw \(n\) balls from a box with red and blue balls, where there is a total of \(N\) balls. There are a total of \(D\) red balls in the box. What's the probability that we will get exactly \(k\) red balls?
draw | no draw | total | |
---|---|---|---|
red | k | D-k | D |
blue | n-k | N+k-n-D | N-D |
total | n | N-n | N |
\(P(k,N,D,n)=\dfrac{\binom{D}{k} \binom{N-D}{n-k}}{\binom{N}{n}},\qquad k=0,1,2,\ldots,n\)