5.1 - Notation and Structure

A three-way contingency table is a cross-classification of observations by the levels of three categorical variables. Suppose that we have three categorical variables, \(X\), \(Y\), and \(Z\), where

\(X\) takes possible values \(1,2,\ldots,I\)

\(Y\) takes possible values \(1,2,\ldots,J\)

\(Z\) takes possible values \(1,2,\ldots,K\)

If we collect the triplet \((X,Y,Z)\) for each individual in a sample of size \(n\), then the data can be summarized as a three-dimensional table. Let \(n_{ijk}\) be the number of units for which \(X = i\), \(Y = j\), and \(Z = k\). Then the vector of cell counts \((n_{111}, n_{112}, \dots , n_{IJK})\) can be arranged in a table whose dimensions are \(I \times J \times K\). Geometrically, we can think of this as a cube. For example, the Rubik’s cube is a \(3 \times 3 \times 3\) table.

Example: Berkeley Admissions

The table below lists graduate admissions information for the six largest departments at U.C. Berkeley in the fall of 1973.

Dept.	Males admitted	Males rejected	Females admitted	Females rejected
A	512	313	89	19
B	353	207	17	8
C	120	205	202	391
D	138	279	131	244
E	53	138	94	299
F	22	351	24	317

Although it's arranged in a \(6\times 4\) table to easily view on this page, we can imagine this in three dimensions as a \(2\times2\times6\) table, where \(X =\) sex (1 = male, 2 = female), \(Y =\) admission status (1 = admitted, 2 = rejected), and \(Z =\) department (1 = A, 2 = B, . . ., 6 = F), in which case \(n_{112}= 353\) corresponds to the number of males admitted to department B.

Possible questions of interest here would be whether admission rates differ by sex or among departments. As we'll see, however, these relationships can be measured in different ways.

Partial Tables

To display three-way tables, we typically use a set of two-way tables. These are referred to as partial tables. There are three ways to do this:

Consider \(I\), \(Y\times Z\) tables for each level of \(X\)
Consider \(J\), \(X\times Z\) tables for each level of \(Y\)
Consider \(K\), \(X\times Y\) tables for each level of \(Z\)

For example, we could look at the two-way (partial) table between sex and admission status for each department in the Berkeley data. This representation corresponds to a conditional distribution because the department is fixed within each table. For the first two departments, we have

Department A

		Admitted	Rejected
		Admission status
Sex	Male	512	313
Sex	Female	89	19

Department B

		Admitted	Rejected
		Admission status
Sex	Male	353	207
Sex	Female	17	8

Marginal Tables

As before, we will use "+" to indicate summation over a subscript; for example in the expression below we sum over \(k\),

\(n_{ij+}=\sum\limits_{k=1}^K n_{ijk}\).

Then the vector of counts \((n_{11+}, n_{12+}, \dots, n_{IJ+})\) can be arranged into a table of \(I\times J\) dimensions, referred to as the marginal table of \(X\) and \(Y\). There are three possible two-way marginal tables resulting from one three-way table. Essentially, by summing over one variable, we ignore its association with each of the other variables. This idea can even extend to the marginal table of a single variable by summing over both of the other variables.

In the Berkeley admission example, the table below is the marginal one between sex and admission status, obtained by summing over the six departments. If we had observed only this table, we would not know anything about the departments.

	Admitted	Rejected	Total
Male	1198	1493	2691
Female	557	1278	1835
Total	1755	2771	4526

Joint Distribution

If the \(n\) individuals in the sample are independent and identically distributed (IID), that is if they are a random sample, then the vector of cell counts \( ({n_{111}, n_{112}, \dots, n_{IJK}})\) has a multinomial distribution with index \(n = n_{+++}\) and vector of probabilities \(\pi=(\pi_{111},\pi_{112},\ldots,\pi_{IJK})\), where

\(\pi_{ijk} = P(X = i, Y = j, Z = k)\)

That is, \(\pi_{ijk}\) is the joint probability that a randomly selected individual falls in the \((i, j, k)\) cell of the contingency table. Under the unrestricted (saturated) multinomial model, there are no constraints on \(\pi\) other than \(\sum\limits_{i=1}^I \sum\limits_{j=1}^J\sum\limits_{k=1}^K\pi_{ijk}= 1\), and the maximum likelihood (ML) estimates are the sample proportions: \(\hat{\pi}_{ijk}=n_{ijk}/n\). The saturated model always fits the data perfectly, and the expected frequency of the \((i, j, k)\) cell \(\mu_{ijk}=n\pi_{ijk}\) is estimated with \(n\hat{\pi}_{ijk}=n_{ijk}\), the observed frequency of the \((i, j, k)\) cell for all \(i\), \(j\), and \(k\).

Question: This model yields \(X^2 = G^2 = 0\) with zero degrees of freedom. Why?

Fitting a saturated model might not reveal any special structure that may exist in the relationships among the variables. To investigate these relationships, we propose simpler models and perform tests to see whether these simpler models fit the data by comparing them to the saturated model---that is the observed data---as we did in the previous lessons. These new models will depend on marginal and conditional distributions.

Conditional Distributions

The conditional distribution is a subset of variables given another mutually exclusive subset of variables. For example, the conditional distribution of \(X\) and \(Y\), given \(Z\), is \({\pi_{ij|k}} = \pi_{ijk} / \pi_{++k}\), such that \(\sum_{ij} \pi{ij|k} = 1\). Intuitively, we're asking how the joint distribution of \(X\) and \(Y\) change as the levels of \(Z\) change.

We can also consider the conditional distribution of one variable given the other two. For example, \({\pi_{j|ik}} = \pi_{ijk} / \pi_{i+k}\), such that \(\sum_j \pi_{j|ik} = 1\). This is the conditional distribution of \(Y\), given both \(X\) and \(Z\). Intuitively, we're asking how the distribution of \(Y\) changes as the levels of either \(X\) or \(Z\) changes.

Stop and Think!

For the Berkeley admissions example, what is the observed conditional distribution of sex and admission status, given Department B?

From the above conditional table, we have \(n_{++2}=353+207+17+8=585\) total individuals applying to Department B. Dividing each of the observed counts by this total gives their conditional distribution:

Department B
		Admission status
		Admitted	Rejected
Sex	Male	353/585=0.6034	207/585=0.3538
Sex	Female	17/585=0.0291	8/585=0.0137

Notice that these proportions necessarily sum to one.

Marginal associations and conditional associations can be very different! In the next section, we define and study the marginal and conditional odds-ratios to help us understand a potential difference in these associations and their impact on statistical inference.

Sampling Schemes

What are some ways of generating three-way (or higher) tables of counts? We essentially have the same sampling schemes as what we saw for two-way tables:

Poisson unrestricted sampling – nothing is fixed, each cell is a Poisson random variable with a rate \(\mu_{ijk}\)
Multinomial sampling with fixed total sample size \(n\)

With higher tables, since we have more "total" sample sizes to fix, we have additional ways to think of sampling such as:

Stratified sampling where we have the product-multinomial sampling with fixed sample size for each partial table, e.g. \(n_{++k}\)
Product-Multinomial sampling within each partial table, e.g. fix \(n_{i+k}\), that is fix the rows within each partial table.

Next, let's define marginal and conditional odds-ratios.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility

Dept.	Males admitted	Males rejected	Females admitted	Females rejected
A	512	313	89	19
B	353	207	17	8
C	120	205	202	391
D	138	279	131	244
E	53	138	94	299
F	22	351	24	317

Dept.	Males admitted	Males rejected	Females admitted	Females rejected
A	512	313	89	19
B	353	207	17	8
C	120	205	202	391
D	138	279	131	244
E	53	138	94	299
F	22	351	24	317

Dept.	Males admitted	Males rejected	Females admitted	Females rejected
A	512	313	89	19
B	353	207	17	8
C	120	205	202	391
D	138	279	131	244
E	53	138	94	299
F	22	351	24	317