3: TwoWay Tables: Independence and Association
3: TwoWay Tables: Independence and AssociationOverview
This lesson is about the analysis of twoway tables. We begin with the structure of the simplest twoway table, a \(2\times 2\) table, and its corresponding joint distribution. We then consider how to extend the goodnessoffit tests that we saw in Lesson 2. More specifically, we will learn the Chisquared test of independence for two discrete binary random variables, and basic measures of associations such as oddsratios and relative risk that will describe the strength of association between two binary random variables. We will then extend these concepts to an \(I \times J\) table, to analyze two discrete random variables with \(I\) and \(J\) categories.
Lesson 3 Code Files
SAS Files
Useful Links
 SAS introduction to Categorical procedures
 R: Cross Tabulation and Table Creation
 R: Pearson's Chisquared Test for Count Data
 R: Visualizing Categorical Data and Computing Measures of Association: VCD package
 R: Computing Measures of Associaton: EPITOOLS package
 AccuracyHandout.pdf [For more on sensitivity and specificity]
3.1  Notation & Structure
3.1  Notation & StructureWhen collecting data on two categorical variables, we can easily summarize the responses in the form of a table with the levels of one variable corresponding to the rows, the levels of the other variable corresponding to the columns, and the count of individuals answering accordingly in each cell. Specifically, we'll frequently use the following terms:
 Twoway contingency table
 A twoway contingency table is a crossclassification of observations by the levels of two discrete variables.
 Cell
 The cells of the table contain frequency count.
 Dimension
 The dimension of the table is determined by the number of variables.
 Size
 The size of the table refers to the number of cells. For example, a 2dimensional (2way) table of size \(2\times2\), is a crossclassification table of two discrete variables, each with two levels, having a total of 4 cells.
Example: Therapeutic Value of Vitamin C
This is an example of a doubleblind study investigating the therapeutic value of vitamin C (ascorbic acid) for treating common colds. The study was conducted during a 2 week period on a sample of 280 French skiers, but one observation had to be dropped. There are two discrete variables each having two levels  hence the twoway table.
Table 1: Incidence of Common Colds involving French Skiers (Pauling, 1971) as reported in Fienberg (1980).
Cold  No Cold  Totals  

Placebo  31  109  140 
Ascorbic Acid  17  122  139 
Totals  48  231  279 
Each cell indicates levels of both traits. For example, 31 skiers were given a placebo and contracted cold while 109 did not. Here is the same data, presented in a \(2\times2\) table using sample proportions instead.
Table 2: Incidence of Common Colds involving French Skiers (Pauling, 1971) as reported in Fienberg (1980).
Cold  No Cold  Totals  

Placebo  0.111  0.391  0.502 
Ascorbic Acid  0.061  0.437  0.498 
Totals  0.172  0.828  1 
Here are some questions that we may ask regarding this data.
Is the probability that a member of the placebo group contracts a cold the same as the probability that a member of the ascorbic group contracts a cold?
Are the type of treatment and cold status associated or independent? Here, independence means that having a placebo or ascorbic acid has no relationship with having a cold or otherwise.
What are the odds of getting a cold for those taking ascorbic acid (vitamin C)?
Example: Coronary Heart Disease
This is an example of a \(2\times4\) table. The data below are taken from the Framingham longitudinal study of coronary heart disease (Cornfield, 1962). In this study, \(n=1329\) patients were classified by serum cholesterol level (mg/100 cc) and whether they had been diagnosed with coronary heart disease (CHD).
0–199  200–219  220–259  260+  total  

CHD  12  8  31  41  92 
no CHD  307  246  439  245  1237 
total  319  254  470  286  1329 
One variable is binary and has two outcomes. The other variable has four levels  this is most likely a continuous variable, but it has been grouped into four intervals.
For the Coronary Heart Disease example, the variable along the top row of the table is the amount of Total Cholesterol. This was originally a continuous variable but is now an interval variable because it was broken up into intervals, but it is also ordinal because there is a natural order and progression in the levels of this variable.
Is there any evidence of a relationship/association between cholesterol level and heart disease?
Example: Smoking
This is an example of a \(3\times2\) table. The table below classifies 5375 high school students according to the smoking behavior of the student \(Z\) and the smoking behavior of the student’s parents \(Y\).
Student smokes?  

How many parents smoke?  Yes (Z = 1)  No (Z = 2) 
Both (Y = 1)  400  1380 
One (Y = 2)  416  1823 
Neither (Y = 3)  188  1168 
By default, the row variable is ordinal because there is a natural progression in this variable because the number of parents smoking increases from 1 to 2 to both. But if you are not interested in the ordinality you can treat it as nominal.
Question: Is there a relationship of smoking behavior between the students and their parents?
Suppose that we collect data on two binary variables, \(Y\) and \(Z\), for example, "treatment" and "contracting cold" for \(n\) sample units. Binary means that these variables take two possible values say 1 (e.g. "cold") and 2 (e.g. "no cold").
\(Y\), taking possible values \(i = 1, \ldots, I\), where \(I = 2\),
\(Z\), taking possible values \(j = 1, \ldots, J\), where \(J = 2\).
The data then consist of \(n\) pairs,
\((y_1, z_1), (y_2, z_2), \ldots , (y_n, z_n)\)
which can be summarized in a frequency table.
Let \(n_{ij}\) be the number of subjects having the following characteristics \((Y = i, Z = j)\) (that is, the number of subjects falling into a particular cell of the twoway table, more specifically falling into the \(i\)th level of \(Y\) and the \(j\)th level of \(Z\)). The total sample size is \(\sum_{i=1}^I\sum_{j=1}^J n_{ij}=n\) . The levels of the first variable are represented by the index \(i\) and the levels of the second variable by index \(j\).
For the Vitamin C example data, \(n_{11} = 31\) means that in our sample we observed 31 individuals who took a placebo pill and got the cold. The counts may be arranged in a \(2\times2\) table:
Z = 1  Z = 2  

Y = 1  \(n_{11}\)  \(n_{12}\) 
Y = 2  \(n_{21}\)  \(n_{22}\) 
The total number of cells in the table is denoted as \(n=IJ\), which is 4 in this case. In some textbooks the authors will use \(x_{ij}\) instead of \(n_{ij}\).
The observed table \(x = (n_{11}, n_{12}, n_{21}, n_{22})\) is a summary of all \(n\) responses, e.g., the values of four counts of the Vitamin C example out of 279 total responses/individuals. We could display a contingency table X as a oneway table with four cells, but it is customary to display \(X\) as a twodimensional table with the separate row and column variables as above. Let's see what are some other important structural elements of such tables.
Marginal Totals
When a subscript in a cell count \(n_{ij}\) is replaced by a plus sign (+) or a dot (.), it will mean that we have taken the sum of the cell counts over that subscript.
The row totals are
\(n_{1+} = n_{11} + n_{12}\)
\(n_{2+} = n_{21} + n_{22}\)
the column totals are
\(n_{+1} = n_{11} + n_{21}\)
\(n_{+2} = n_{12} + n22\)
and the grand total is \(n_{++} = n_{11} + n_{12} + n_{21} + n_{22} = n\).
These quantities are often called marginal totals, because they are conveniently placed in the margins of the table, like this:
Z = 1  Z = 2  total  

Y = 1  \(n_{11}\)  \(n_{12}\)  \(n_{1+}\) 
Y = 2  \(n_{21}\)  \(n_{22}\)  \(n_{2+}\) 
total  \(n_{+1}\)  \(n_{+2}\)  \(n_{++}\) 
For example, the marginal totals for the Vitamin C data are \(n_{1+} = 140\), and \(n_{2+} = 139\).
Joint Distribution
If the sample units are randomly sampled from a large population, then the observed table \(x = (n_{11}, n_{12}, n_{21}, n_{22})\) will have a multinomial distribution with index \(n = n_{++}\) and a parameter vector
\(\boldsymbol{\pi} =(\pi_{11},\pi_{12},\pi_{21},\pi_{22}) =\{\pi_{ij}\}\)
where \(\pi_{ij} = P (Y = i, Z = j)\) is the probability that a randomly selected individual in the population of interest falls into the \((i, j)\)th cell of the contingency table, that is, into the \(i\)th level of \(Y\) and \(j\)th level of \(Z\).
Z = 1  Z = 2  total  

Y = 1  \(\pi_{11}\)  \(\pi_{12}\)  \(\pi_{1+}\) 
Y = 2  \(\pi_{21}\)  \(\pi_{22}\)  \(\pi_{2+}\) 
total  \(\pi_{+1}\)  \(\pi_{+2}\)  \(\pi_{++} = 1\) 
For observed data, we may also use \(p\) instead of \(\hat{\pi}\) to represent a sample proportion. That is, \(p_{ij}=\frac{n_{ij}}{n}\) is the sample proportion of observations in the \((i, j)\)th cell.
Marginal and Conditional Distributions
If we sum the joint probabilities over one variable, we get the marginal distribution. For example, the probability distribution \({\pi_{i+}}\) is the marginal distribution for \(Y\) where \(P(Y = 1) = \pi_{1+}\) and \(P(Y = 2) = \pi_{2+}\) and \(\pi_{1+} + \pi_{2+} =1\). Then the observed marginal distribution of \(Y\) is \({p_{i+}}\). For the Vitamin C data, the observed marginal distribution of type of treatment is
\(p_{1+}= \dfrac{n_{1+}}{n} = \dfrac{140}{279} = 0.502\) and \(p_{2+}= \dfrac{n2+}{n} = \dfrac{139}{279} = 0.498\)
What is the marginal distribution of \(Z\) in the Vitamin C example? What is the observed marginal distribution of \(Z\) for the same example?
The conditional probability distribution is a probability of one variable given the values of other variable(s). For example, the conditional distribution of \(Z\), given the values of \(Y\) , is \(\pi_{jI=i}=\frac{\pi_{ij}}{\pi_{i+}}\), such that \(\sum_j \pi_{jI=i} = 1\). Intuitively, we're asking how the distribution of \(Z\) changes as the categories of \(Y\) change.
Here are the observed conditional probability distributions of \(Z\), given \(Y\).
Z = 1  Z = 2  total  

Y = 1  \(\dfrac{n_{11}}{n_{1+}}= p_{11}\)  \(\dfrac{n_{12}}{n_{1+}}= p_{21}\)  1 
Y = 2  \(\dfrac{n21}{n2+}= p_{12}\)  \(\dfrac{n_{22}}{n_{2+}}= p_{22}\)  1 
Let's see what this means for the distribution of cold, given treatment. There are two conditional probability distributions, depending on whether the treatment is "vitamin" or "placebo". For the subjects receiving the placebo treatment, we have \(P(\mbox{"yes"}  \mbox{"placebo"}) =31/140\) and \(P(\mbox{"no"}  \mbox{"placebo"}) =109/140\). Notice that these two values necessarily add to 1. Similarly, for the vitamin treatment, we have \(P(\mbox{"yes"}  \mbox{"vitamin"}) =17/139\) and \(P(\mbox{"no"}  \mbox{"vitamin"}) =122/139\).
What is the observed conditional probability distribution of treatment, given cold for the Vitamin C data?
Notation extension to any \(I \times J\) table
For general \(Y\) and \(Z\), the counts are usually arranged in a twoway table:
\(\begin{array}{ccccc} & Z=1 & Z=2 & \cdots & Z=J \\ \hline Y=1 & n_{11} & n_{12} & \cdots & n_{1 J} \\ Y=2 & n_{21} & n_{22} & \cdots & n_{2 J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ Y=I & n_{I 1} & n_{I 2} & \cdots & n_{I J} \\ \hline \end{array} \)
The total number of cells is \(n = I\times J\), and the marginal totals are:
\(n_{i+}=\sum\limits_{j=1}^J n_{ij},\qquad i=1,\ldots,I \)
\(n_{+j}=\sum\limits_{i=1}^I n_{ij},\qquad j=1,\ldots,J \)
\(n_{++}=\sum\limits_{i=1}^I \sum\limits_{j=1}^J n_{ij}=n \)
If the sample units are randomly selected from a large population, we can assume that the cell counts \(\left(n_{11}, \dots , n_{IJ}\right)\) have a multinomial distribution with index \(n_{++} = n\) and parameters
\(\pi = (\pi_{11}, \dots, \pi_{IJ})\)
This is the general multinomial model, and it is often called the saturated model, because it contains the maximum number of unknown parameters. There are \( I\times J\) unknown parameters (elements) in the vector \(\pi\) but because the elements of \(\pi\) must sum to one since this is a probability distribution, then there are really \( I\times J  1\) unknown parameters that we need to estimate.
3.2  Sampling Schemes
3.2  Sampling SchemesGenerating twoway tables of counts is similar to generating oneway tables of counts but with a higher degree of complexity. The main generating probability mechanisms are Poisson, Binomial, and Multinomial models, but for twoway tables, the margins play a big role. We will discuss the following sampling schemes:
 Unrestricted sampling (Poisson)
 Sampling with fixed total sample size (Multinomial)
 Sampling with fixed certain marginal totals (ProductMultinomial, Hypergeometric)
These sampling models extend to higherdimensional tables as we will see in the later lessons.
Poisson Sampling
Think of collecting data until the sundown on the skislopes of a French skiresort regarding whether the skiers have taken placebo or Vitamin C and whether they have a cold or not. As one skier is encountered, we ask him/her about these two variables of interest. Or, consider standing at the crossing of two major thoroughfares of a Midwest town and classifying each passing car whether it is driven by a male or a female and whether the car is an American make or not. In both examples, the total sample size is completely random and so are both margins of the table. Hence, each cell is considered an independent Poisson variable. The cell counts will follow a Poisson distribution
\(n_{ij}\sim Poisson(\lambda_{ij})\)
independently for \( i = 1, \ldots , I\) and \(j = 1, \ldots , J\). In this scheme, the overall \( n\) is not fixed. \(\lambda_{ij}\) is the parameter describing the rate of occurrence for the \((i, j)\)th cell. The expected mean and the variance of the cell are
\(E(n_{ij})=\lambda_{ij}\)
\(Var(n_{ij})=\lambda_{ij}\)
We can think of rewriting the twoway table as a oneway table; e.g. we can think of \(\lambda_{ij}\) as the \(\lambda_{i}\) we saw in oneway frequency tables.
Question: Are the rates in different cells the same or different for different cells?
It depends on the context of the problem and the model assumptions; we could assume the same underlying \(\lambda\) or different ones.
\(n\sim Poisson(\lambda_{++})\)
and a multinomial likelihood for \({n_{ij}}\) given \(n\) , with parameters
\(\pi_{ij}=\dfrac{\lambda_{ij}}{\lambda_{++}}\)
Here, the total \(n\) provides no information about \(\pi = \pi_{ij}\). From a likelihood standpoint, we get the same inferences about \(\pi\) whether \(n\) is regarded as fixed or random.
Multinomial Sampling
Consider now collecting data on a predetermined number of individuals (e.g., 279 in the Vitamin C example) and classifying them according to two binary variables (e.g., treatment and response). If we draw a sample of \(n\) subjects from a population and record \((Y, Z)\) for each subject, then the joint distribution of \({n_{ij}}\) is multinomial with index \(n\) and parameter \(\pi = \pi_{ij}\),
\(\pi_{ij}=P(Y=i,Z=j)\)
where the grand total \(n\) is fixed and known. Parameters are functions of the cell means:
\(\mu_{ij}=E(n_{ij})=n\pi_{ij}\)
Question: Think of rewriting a twoway table as a oneway frequency table. How would you do this for the Vitamin C example?
An extension of this case occurs when, instead of fixing the total \(n\), either row totals OR the column totals are assumed fixed, as we will see next.
Product Multinomial Sampling
Consider now collecting data on 140 "placebo" and 139 "vitamin c" individuals and classifying them according to the response (e.g., if they got the a cold or not). Here, data are collected on a predetermined number of individuals for each category of one variable, and both sets are classified according to the levels of the other variable of interest. Hence, one margin is fixed by design while the other is free to vary. This type of sampling is called Independent Multinomial Sampling. If the response variable has only two levels, it is also called Independent Binomial Sampling, which is a special case of independent multinomial sampling.
If we decide beforehand that we will draw \(n_{i+}\) subjects with characteristic \(Y = i (i = 1, \ldots , I)\) and record the \(Z\)value for each one, each row of the table \((n_{i1}, n_{i2}, \ldots , n_{iJ})\) is then multinomial with probabilities \(\pi_{ji} = \dfrac{\pi_{ij}}{\pi_{i+}}\), and the rows are independent. The full likelihood is obtained by taking the product of the individual multinomial PMFs and therefore is known as productmultinomial sampling scheme.
Viewing the data as productmultinomial is appropriate when the row totals truly are fixed by design, as in
 stratified random sampling (strata defined by \(Y\) )
 an experiment where \(Y =\) treatment group
It’s also appropriate when the row totals are not fixed, but we are interested in inference on \(P(Z  Y)\) and not \(P(Y)\). That is, when \(Z\) is the outcome of interest, and \(Y\) is an explanatory variable that we do not wish to model.
Suppose the data are multinomial. Then we may factor the likelihood into two parts:
 a multinomial likelihood for the row totals \((n_{1+}, n_{2+}, \ldots , n_{I+})\) with index \(n\) and parameter \({\pi_{i+}}\)
 independent multinomial likelihoods for the rows, \((n_{i1} , n_{i2} , \ldots , n_{iJ} )\) with parameters \({\pi_{ji} = \dfrac{\pi_{ij}}{\pi_{i+}}}\).
Therefore, if the parameters of interest can be expressed as functions only of the \(\pi_{ji}\)’s and not the \(\pi_{i+}\)’s, then correct likelihoodbased inferences may be obtained by treating the data as if they were productmultinomial. Conversely, if the data are productmultinomial, then correct likelihoodbased inferences about functions of the \(\pi_{ji}\)s will be obtained if we analyze the data as if they were multinomial. We may also treat them as Poisson, ignoring any inferences about \(n_{++}\) or \(n_{i+}\).
Hypergeometric Sampling
We may encounter data where both the row totals \((n_{1+}, . . . , n_{I+})\) and the column totals \((n_{+1}, . . . , n_{+J} )\) are fixed by design. The bestknown example of this is Fisher’s hypothetical example of the Lady Tasting Tea, which will be discussed in the section on Exact Tests. Even when both sets of marginal totals are not fixed by design, some statisticians like to condition on them and perform "exact" inference when the sample size is small and asymptotic approximations are unlikely to work well.
In a \(2\times2\) table, the resulting sampling distribution is hypergeometric, which we introduced in Lesson 1. Recall, that the hypergeometric distribution describes the probability that in a sample of \(n\) distinctive units drawn from a finite population of size \(N\) without replacement, there are \(k\) successes. Consider that you draw \(n\) balls from a box with red and blue balls, where there is a total of \(N\) balls. There are a total of \(D\) red balls in the box. What's the probability that we will get exactly \(k\) red balls?
draw  no draw  total  

red  k  Dk  D 
blue  nk  N+knD  ND 
total  n  Nn  N 
\(P(k,N,D,n)=\dfrac{\binom{D}{k} \binom{ND}{nk}}{\binom{N}{n}},\qquad k=0,1,2,\ldots,n\)
3.3  Test for Independence
3.3  Test for IndependenceTwoway contingency tables show the behavior (distribution) of two discrete variables. Given an \(I\times J\) table, it is therefore natural to ask if and how are \(Y\) and \(Z\) related?
Suppose for the moment that there is no relationship between \(Y\) and \(Z\), i.e., they are independent.
 Statistical Independence

By statistical independence, we mean that the joint probabilities are equal to the product of the marginal probabilities:
\(\pi_{ij}=P(Y=i,Z=j)=P(Y=i)P(Z=j)\)
\(\pi_{ij}=\pi_{i+}\pi_{+j}\)for all pairs of \(i, j\). If the data are independent and we know the marginal totals, then we can determine the exact probabilities for each of the cells. Independence may be expressed in other forms also, but for now, we will use this definition. Intuitively, independence means that the behavior of one variable, say \(Y\), will not be impacted by the behavior of \(Z\). In the Vitamin C example, independence means that whether or not a skier takes vitamin C has nothing to do with whether or not he/she has a cold.
ChiSquared Test of Independence
The hypothesis of independence can be tested using the general method of goodnessoffit test described earlier.
\(H_0\): the independence model is true i.e. \(\pi_{ij} = \pi_{i+}\pi_{+j}\)_{ }for all pairs of \((i, j)\)
versus the alternative
\(H_A\): the saturated model is true, i.e. \(\pi_{ij} \ne \pi_{i+}\pi_{+j}\)_{ }for at least one pair of \((i, j)\)
 Step 1
calculate expected counts under the independence model
 Step 2
compare the expected counts \(E_{ij}\) to the observed counts \(O_{ij}\)
 Step 3
calculate \(X^2\) and/or \(G^2\) for testing the hypothesis of independence, and compare the values to the appropriate chisquared distribution with correct df \((I1)(J1)\)
Before we see how to do this in R and SAS, let's see more about the saturated model and independence model in \(I\times J\) tables.
Saturated Model
Suppose that the cell counts \(n_{11}, \dots , n_{IJ}\) have a multinomial distribution with index \(n_{++} = n\) and parameters
\(\pi=(\pi_{11},\ldots,\pi_{IJ})\)
(these results also work for Poisson or productmultinomial sampling). If the saturated model is true, the number of unknown parameters we need to estimate is maximal (just like in oneway tables) and it is equal to the number of cells in the table. But because the elements of \(\pi\) must sum to one, the saturated model actually has \(IJ − 1\) free parameters. For example if we had a \(3\times5\) table, we would have \(15  1 = 14\) unique parameters \(\pi_{ij}\)s that need to be estimated.
Independence Model
If the two variables Y and Z are independent, then \(\pi\) has a special form. Let
\(\pi_{i+} = P(Y = i), i = 1, 2, \ldots , I \)
\(\pi_{+j} = P(Z = j), j = 1, 2, \ldots, J\)
Note that \(\sum\limits_{i=1}^I \pi_{i+}=\sum\limits_{j=1}^J \pi_{+j}=1\), so the vectors \(\pi_{i+} = (\pi_{1+}, \pi_{2+}, . . . , \pi_{I+})\) and \(\pi_{+j} = (\pi_{+1}, \pi_{+2}, \ldots , \pi_{+J} )\) representing the marginal distributions (row probabilities and column probabilities) containing \( I − 1\) and \(J − 1\) unknown parameters, respectively.
If \(Y\) and \(Z\) are independent, then any element of \(\pi\) can be expressed as
\(\pi_{ij}=P(Y=i)P(Z=j)=\pi_{i+}\pi_{+j}\)
Thus, under independence, \(\pi\) is a function of \((I − 1) + (J − 1)\) unknown parameters.
The parameters under the independence model can be estimated as follows. Note that the vector of row sums (observed marginal counts for the row variable \(Y\)), \((n_{1+},n_{2+},\ldots,n_{I+})\) has a multinomial distribution with index \(n\) and parameter \(\pi_{i+}\). The vector of column sums (observed marginal counts for the column variable \(Z\)), \((n_{+1},n_{+2},\ldots,n_{+J})\) has a multinomial distribution with index \(n = n_{++}\) and parameter \(\pi_{+j}\). The elements of \(\pi_{i+}\) and \(\pi_{+j}\) can thus be estimated by the sample proportions in each margin:
\(\hat{\pi}_{i+}=n_{i+}/n_{++},\qquad i=1,2,\ldots,I\)
and
\(\hat{\pi}_{+j}=n_{+j}/n_{++},\qquad j=1,2,\ldots,J\)
respectively.
Then, the estimated expected cell frequencies under the independence model are
\(E_{ij}=n\hat{\pi}_{ij}=n\hat{\pi}_{i+}\hat{\pi}_{+j}=\dfrac{n_{i+}n_{+j}}{n_{++}}\)
which can be remembered as
expected frequency = row total × column total / grand total .
Pearson and Deviance Statistics
Since for jointly observed \((Y,Z)\) the twoway table counts can be viewed as a single multinomial distribution with \(IJ\) categories, we can apply the chisquare approximation in the same way we applied it for the goodnessoffit tests; we just need to adapt to the double index and sum over all cells in both dimensions. That is, the quantity
\(\sum\limits_{i=1}^I \sum\limits_{j=1}^J \dfrac{(n_{ij}n\pi_{ij})^2}{n\pi_{ij}}\)
is approximately chisquared with degrees of freedom equal to \(\nu=IJ1\). And if we have a null hypothesis in mind, say for independence, we can use the estimated probabilities under that hypothesis to construct both the Pearson and likelihood ratio test statistics. The degrees of freedom would be reduced by the number of estimated parameters as well. In what follows below, we assume the null hypothesis (reduced model) to be tested is that of independence, but other hypotheses could also be considered.
 Pearson goodnessoffit statistic
 The Pearson goodnessoffit statistic is
\(X^2=\sum\limits_{i=1}^I \sum\limits_{j=1}^J \dfrac{(n_{ij}n\hat{\pi}_{ij})^2}{n\hat{\pi}_{ij}}\)
where \(\hat{\pi}_{ij}=(n_{i+}/n)(n_{+j}/n)\) under the independence model. Note that this expression still corresponds to
\(X^2=\sum\limits_{i=1}^I \sum\limits_{j=1}^J \dfrac{(O_{ij}E_{ij})^2}{E_{ij}}\)
where \(O_{ij} = n_{ij}\) is the observed count and \(E_{ij} = E(n_{ij})\) is the expected count under the null hypothesis.
The deviance statistic can similarly be calculated for twoway tables:
\(G^2=2\sum\limits_{i=1}^I \sum\limits_{j=1}^J n_{ij}\log\left(\dfrac{n_{ij}}{n\hat{\pi}_{ij}}\right) = 2\sum\limits_{i=1}^I \sum\limits_{j=1}^J O_{ij}\log\left(\dfrac{O_{ij}}{E_{ij}}\right)\)
\(G^2\) is also called the likelihoodratio test statistic or likelihoodratio chisquared test statistic. Recall from the discussion on oneway tables that we are comparing the likelihoods of the assumed model under \(H_0\) and some alternative model, \(H_A\), typically the saturated model (i.e., the observed data) by default. More generally, the likelihoodratio test statistic can be described as follows:
Let max \(L(H_0)\) be the maximum of the likelihood when parameters satisfy \(H_0; H_0\) usually has more restrictions on the parameters.
Let max \(L(H_A)\) be the maximum of the likelihood when parameters satisfy \(H_A; H_A\) usually has no or fewer restrictions on the parameters.
Then the likelihoodratio statistic would be:
\(\Lambda=\dfrac{\max L(H_0)}{\max L(H_A)}\)
and the deviance \(G^2 = −2\log(\Lambda)\).
The smaller the likelihood under \(H_0\) (less chance of the restricted model to hold given the data), the more evidence you would have against \(H_0\), that is, the smaller \(\Lambda\) and greater \(G^2\).
What are the degrees of freedom for this test?
The general rule for DF
DF are equal to the number of parameters specified (estimated) under the alternative model (hypothesis) minus the number of parameters estimated under the null model (hypothesis).
Computing Degrees of Freedom
Recall, under the saturated model, \(\pi\) contains \(IJ1\) free (unique) parameters. And under the independence model, \(\pi\) is a function of \((I1)+(J1)\) parameters since each joint probability \(\pi_{ij}\) can be written as the product of the marginals \(\pi_{i+}\pi_{+j}\), each of which has the sumtoone constraint. The degrees of freedom are therefore
\(\nu=(IJ1)(I1)(J1)=(I1)(J1)\)
For \(2\times2\) tables, this reduces to \((2  1)(2  1) = 1\).
A large value of \(X^2\) or \(G^2\) indicates that the independence model is not plausible and that \(Y\) and \(Z\) are related. Under the null hypothesis, \(X^2\) and \(G^2\) are approximately distributed as a chisquared distribution with \(\nu= (I − 1)(J − 1)\) degrees of freedom, provided that (a) the \(n\) sample units are iid (i.e., there is no clustering), and (b) the expected counts \(E_{ij}\) are sufficiently large (recall the discussion in oneway tables) At a minimum we'd like to have all \(E_{ij}\ge1\) with at least 80% of them 5 or more.
For \(2\times2\) tables, the 95th percentile of \(\chi^2_1\) is 3.84, so an observed value of \(X^2\) or \(G^2\) greater than 3.84 means that we can reject the null hypothesis of independence at the .05 level.
Example: Vitamin C
How can we do the test of independence computationally? Let's illustrate this first using the Vitamin C example, which is the \(2\times2\) case. We're interested in whether the treatment type and contracting cold are associated.
Here is an example in SAS using the program code VitaminC.sas; see the R tab for the R code.
data ski;
input treatment $ response $ count;
datalines;
placebo cold 31
placebo nocold 109
ascorbic cold 17
ascorbic nocold 122
;
run;
proc freq;
weight count;
tables treatment*response/ chisq relrisk riskdiff expected;
exact or;
run;
Let's look at different parts of the output. SAS output for this program.
Heading I:
Table of treatment by response section produces the table with observed, expected values, sample proportions, and conditional probabilities.
The FREQ Procedure


Heading II:
Statistics for Table of treatment by response section produces various test statistics, such as \(X^2\) and \(G^2\).
The FREQ Procedure
Statistics for Table of treatment by response
Statistic  DF  Value  Prob 

ChiSquare  1  4.8114  0.0283 
Likelihood Ratio ChiSquare  1  4.8717  0.0273 
Continuity Adj. ChiSquare  1  4.1407  0.0419 
MantelHaenszel ChiSquare  1  4.7942  0.0286 
Phi Coefficient  0.1313  
Contingency Coefficient  0.1302  
Cramer's V  0.1313 
\(X^2 = 4.8114\) and \(G^2 = 4.8717\), with df=1, indicate strong evidence for rejecting the independence model. Continuity Adj. ChiSquare = 4.1407 with \(p\)value = 0.0419, is the Pearson's \(X^2\) adjusted slightly for small cell counts. The adjustment is to subtract 0.5 from a difference between the observed and the expected counts in the formula for the \(X^2\) statistic, i.e., \({O_{ij}n_{ij}}0.5\). The \(X^2\) statistic with this adjustment gives conservative inference; that is, it gives a bigger \(p\)value than the usual Pearson \(X^2\) statistic without the correction. But since SAS can produce exact tests we won't need to consider this statistic.
Here is the test of independence for Vitamin C example, also found in the section with R files VitaminC.R and its output, VitaminC.out.
### Pearson's Chisquared test with Yates' continuity correction
result<chisq.test(ski)
result
###Let's look at the obseved, expected values and the residuals
result$observed
result$expected
result$residuals
###Pearson's Chisquared test withOUT Yates' continuity correction
result<chisq.test(ski, correct=FALSE)
result
result$observed
result$expected
result$residuals
Notice that by default chisq.test() function in R will give us the \(X^2\) statistic with a slight adjustment for small cell counts. The adjustment is to subtract 0.5 from a difference between the observed and the expected counts in the formula for the \(X^2\) statistic, i.e., \({O_{ij}n_{ij}}0.5\). The \(X^2\) statistic with this adjustment gives conservative inference; that is, it gives a bigger \(p\)value than the usual Pearson \(X^2\) statistic without the correction. To get the usual \(X^2\), we need to invoke an option correct = FALSE.
To compute the deviance statistic for twoway tables, we can use the function LRstats.R or one of the R packages, such as VCD.
Example: Coronary Heart Disease
Next, we return to the Coronary Heart Disease example from the introduction and ask "Is having coronary heart disease independent of the cholesterol level in ones body? Is there any evidence of a relationship/association between cholesterol and heart disease?"
Test of Independence in SAS  Coronary Heart Disease Example
Let's see the same calculation using the SAS code below: HeartDisease.sas
data chd;
input CHD $ serum $ count @@;
datalines;
chd 0199 12 chd 200199 8 chd 220259 31 chd 260+ 41
nochd 0199 307 nochd 200199 246 nochd 220259 439 nochd 260+ 245
;
proc freq; weight count;
tables CHD*serum /chisq expected deviation cmh cellchi2 measures;
/*exact fisher or / alpha=.05;*/
run;
See the complete SAS output: HeartDisease SAS Output.
Here is a portion of the output from SAS with the Pearson chisquare statistic and Deviance (likelihoodratio chisquare) statistic:
Statistics for Table of CHD by serum
Statistic  DF  Value  Prob 

ChiSquare  3  35.0285  <.0001 
Likelihood Ratio ChiSquare  3  31.9212  <.0001 
MantelHaenszel ChiSquare  1  26.1475  <.0001 
Phi Coefficient  0.1623  
Contingency Coefficient  0.1603  
Cramer's V  0.1623 
Test of Independence in R  Coronary Heart Disease Example
Two different computations are done in HeartDisease.R file using the function chisq.test(). Here is the first:
heart <c(12,8,31,41,307,246,439,245)
heart<matrix(heart,4,2)
heart=t(heart)
## run the chisquared test of independence & save it into a new object
result<chisq.test(heart)
result
## Let's look at the obseved, expected values and the residuals
result$observed
result$expected
result$residuals
### Example: Heart Disease Example Lesson 3 ##
### Simple line by line R code
### Nice R code that corresponds to SAS code and output
#######################################################
## enter data
heart <c(12,8,31,41,307,246,439,245)
heart<matrix(heart,4,2)
heart=t(heart)
## run the chisquared test of independence & save it into a new object
result<chisq.test(heart)
result
## Let's look at the obseved, expected values and the residuals
result$observed
result$expected
result$residuals
### Likelihood Ratio Test
LR=2*sum(heart*log(heart/result$expected))
LR
LRchisq=1pchisq(LR,df=(41)*(21))
LRchisq
##make sure you have function LRstats()
LRstats(heart)
## Let's calculate the conditional probabilities
## the following function gives the desired marginal, in this case, the counts for the serum groups
serum<margin.table(heart,2)
serum
## let's look at the counts for the four groups with CHD
heart[1,]
## then counts for the four groups with NOCHD, which is the second column of data in the dataframe we created above
heart[2,]
### conditional probabilities are:
heart[2,]/serum
heart[1,]/serum
########################################
### Nice R code that corresponds to SAS code and output
#######################################################
heart=matrix(c(12,307,8,246,31,439,41,245), ncol=4, dimnames=list(CHD=c("chd", "nochd"), serum=c("0199", "200199","220259","260+")))
heart
count=heart
### ChiSquare Independence Test
result=chisq.test(count)
result$expected
### Let us look at the Percentage, Row Percentage and Column Percentage
### of the total observations contained in each cell.
Contingency_Table=list(Frequency=count,Expected=result$expected,Deviation=countresult$expected,Percentage=prop.table(count),RowPercentage=prop.table(count,1),ColPercentage=prop.table(count,2))
Contingency_Table
###### Computing various measures of association
library(vcd)
assocstats(heart)
### For the Pearson correlation coefficent
### and MantelHaenszel,
### for IxJ tables, you can also use
### pears.cor() function.
### Mak sure you run this function first!
### c(1,2) and c(1,2,3,4), are the vectors of score values
pears.cor(heart, c(1,2),c(1,2,3,4))
### and this should give you, r=0.14, M2=26.1475
##Gamma
Gamma.f(heart)
Output
You will notice in the file that unlike for \(2\times 2\) tables where we had to worry about R the continuity correction, there is no such thing for \( I \times J\) tables. It doesn't matter, in this example, if you call the function chisq.test(heart, correct=TRUE) or chisq.test(heart, correct=FALSE) because the results are the same.
\(X^2= 35.0285\), df = 3, \(p\)value = 1.202e07
and the likelihoodratio test statistic is
\(G^2= 31.9212\), df = 3, \(p\)value = 5.43736e07
Notice that the chisq.test() function does not compute \(G^2\), but we included extra code to do that. For the complete output file, see the HeartDisease.out file. Here is also a more general function LRstats.R for computing the \(G^2\) for twoway tables. The results are discussed further below.
Conclusion
We reject the null hypothesis of independence because of the big values of the chisquare statistics. Notice the degrees of freedom are equal to \(3 = (41)(21)\), and thus the \(p\)value is very low. Therefore, through the \(X^2\) test for independence, we have demonstrated beyond a reasonable doubt that a relationship exists between cholesterol and CHD.
A good statistical analysis, however, should not end with the rejection of a null hypothesis. Once we have demonstrated that a relationship exists, we may be interested in which cells were particularly revealing. To do this we consider computing and evaluating the residuals.
3.4  Difference in Proportions
3.4  Difference in ProportionsWhen the null hypothesis of independence is rejected, the nature of dependenceits direction and magnitudeneeds to be measured. A statistic that measures the direction and magnitude of the relationship between two variables is called an effect size or a measure of association.
One of the most intuitive measures of association is the difference in proportions, which compares the relative frequency of important characteristics between two groups. For example in the Vitamin C study, we want to know if the probability of a member of the placebo group contracting cold is the same as a probability of a member of the ascorbic group contracting cold.
Regarding \(Z\) as a response and \(Y\) as explanatory variable, the difference in proportions for a \(2 \times 2\) table is
\(\delta =P(Z=1Y=1)P(Z=1Y=2)= \dfrac{\pi_{11}}{\pi_{1+}}\dfrac{\pi_{21}}{\pi_{2+}} = \pi_{11}\pi_{12}\)
where \(\pi_{11}\) is the probability of "success" (e.g., "cold"), given row 1, and \(\pi_{12}\) is the probability of "success", given row 2. Recall that these are the conditional probabilities we already described for the Vitamin C example. Thus, the probability of "failure" (e.g., "no cold"), given row 1 is \(\pi_{21}\) and \(\pi_{11}+ \pi_{21}=1\). Similarly, we can find conditional probabilities, given row 2.
In social sciences and epidemiology, these are sometimes referred to as "risk" values. That is, we may refer to the probability that a person gets a cold, given that he/she took a placebo pill, as the risk of such an event. Furthermore, for diagnostic tests, the conditional probability that the diagnostic test is positive, given that the subject has a disease, is called sensitivity. The conditional probability that the diagnostic test is negative, given that the subject does NOT have a disease, is called specificity.
Finally, because \(\delta\) is a function only of the parameters of \(P(Z  Y )\), likelihoodbased inferences will be the same, regardless if calculations assume Poisson sampling or multinomial sampling.
Point Estimation for \(\delta\)
The natural estimate of \(\delta\) is the difference in the conditional sample proportions:
\(\hat{\delta}=d=\dfrac{n_{11}}{n_{1+}}\dfrac{n_{21}}{n_{2+}}=p_{11}p_{12}\)
Properties
 It takes values between \(1\) and \(+1\),
 If variables are independent, the difference in the proportions equals 0.
This is the maximumlikelihood estimate (MLE) because under productmultinomial sampling, the numerators are independent binomials:
\(n_{11} \approx Bin( n_{1+}, \dfrac{\pi_{11}}{\pi_{1+}} )\)
\(n_{21} \approx Bin( n_{2+},\dfrac{\pi_{21}}{\pi_{2+}} )\)
For the Vitamin C example, the estimated or sample difference of proportions of getting a cold is \(d=17/139  31/140 = 0.12  0.22 = 0.10\).
Cold  No Cold  Totals  

Placebo  0.22  0.78  1 
Ascorbic Acid  0.12  0.88  1 
Question: Is this difference "big" or "small"?
Confidence interval for \(\delta\)
If \(n_{1+}\) and \(n_{2+}\) are large, the estimate \(d\) is approximately normal with variance
\(\displaystyle V(d)=\left[\frac{ \frac{\pi_{11}}{\pi_{1+}} (1\frac{\pi_{11}}{\pi_{1+}})} {n_{1+}} + \frac{\frac{\pi_{21}}{\pi_{2+}} (1\frac{\pi_{21}}{\pi_{2+}})} {n_{2+}} \right] \)
This expression follows from the fact that if \(X_1\) and \(X_2\) are independent random variables, then \(Var(X_1 − X_2) = Var(X_1) + Var(X_2)\). Both \(X_1\) and \(X_2\) here are Bernoulli random variables.
Plugging in the estimates, we get the estimate of the variance
\(\hat{V}(d)=\dfrac{n_{11}n_{12}}{(n_{1+})^3}+\dfrac{n_{21}n_{22}}{(n_{2+})^3}\)
which is used for computing the standard errors and confidence intervals.
A large sample, \((1 \alpha) 100\%\) CI for the Vitamin C example is
\(SE(d)=\sqrt{\hat{V}(d)}=\sqrt{\dfrac{0.12\times 0.88}{139}+\dfrac{0.22\times 0.78}{140}}=0.045\)
\(\Rightarrow 0.10\pm 1.96\times 0.045=(0.19,0.01)\)
Interpret
We are 95% confident that the true difference in proportions of people getting cold given the placebo or a vitamin C is somewhere between 1% and 19%. Note that the value of 0 (corresponding to no difference) does not fall within that boundary.
Hypothesis testing for \(\delta\)
Under the null hypothesis of no difference, \(H_0: \delta = 0\), the rows of the table can be pooled to get an estimate of the common proportion, \( P(Z = 1  Y = 1) = P(Z = 1  Y = 2)\).
The pooled estimate is
\(\hat{\pi}=\dfrac{n_{11}+n_{21}}{n_{1+}+n_{2+}}\)
Under \(H_0: \delta = 0\), a more efficient estimate of variance \(V (d)\) is
\(\dfrac{\hat{\pi}(1\hat{\pi})}{n_{1+}}+\dfrac{\hat{\pi}(1\hat{\pi})}{n_{2+}}\)
and the test statistic
\(z=\dfrac{d}{\sqrt{\frac{\hat{p}(1\hat{p})}{n_{1+}}+\frac{\hat{p}(1\hat{p})}{n_{2+}}}}\)
is approximately distributed as \(N(0,1)\). Many elementary textbooks in statistics use this test to compare two proportions. For the Vitamin C example,
\(z=\dfrac{31/14017/139}{\sqrt{\frac{48}{279}\times \frac{231}{279}\times (\frac{1}{140}+\frac{1}{139})}}=2.19\)
This value is significant at the 0.05 level, so if data are taken at face value, we could conclude that the proportion of colds in vitamin C group is smaller than in the placebo group. The result is consistent with confidence interval inference.
For computation in SAS, see VitaminC.sas, Vitamin C SAS Output. The analysis can be done with PROC FREQ, using options MEASURES or RISKDIFF. Compare the values of the above calculations to relevant SAS output under heading "Statistics for Table of Treatment and Response". Notice that \(d=0.099\); we just rounded the value to \(0.01\) in our calculations.
Equivalence to statistical independence test
It is useful to note that the null hypothesis \(H_0:\delta = 0\) is equivalent to independence. Using conditional probabilities, this corresponds to
\(\pi_{j1} = \pi_{j2}\)
Thus we can test \(\delta = 0\) by the usual \(X^2\) or \(G^2\) test for independence in a \(2\times 2\) table already discussed in the previous sections. In fact, we can show that \((\text{zstatistic})^2\) above is algebraically equal to \(X^2\). So a twosided test based on comparing the zstatistic to a \(N(0, 1)\) distribution is identical to comparing \(X^2\) from the test of independence to a chisquared distribution with \(\chi^2_1\). In the Vitamin C example, note that \(X^2 = 4.81 \approx (2.19)^2\).
Even though the difference of two proportions is very easy to interpret, one problem with using \(\delta\) is that when \(Z = 1\) is a rare event, the individual probabilities \(P(Z = 1  Y = 1)\) and \(P(Z = 1  Y = 2)\) are both small, i.e., close to zero. The absolute value of \(\delta\) will be close to zero even when the effect is strong. In the following sections, we study two other common measures of association which compare the relative value of the proportions, rather than the absolute values.
3.5  Relative Risk
3.5  Relative RiskConsider for example the yearly risk of cancer death to be .005 in one population and .01 in another population; most health officials would consider this difference to be important, even though \(\delta = .005\) is very small. This suggests the use of the relative risk of a "success" which is the ratio of the relevant conditional probabilities,
 Relative Risk
 \(\rho=\dfrac{P(Z=1Y=1)}{P(Z=1Y=2)}=\dfrac{\pi_{11}/\pi_{1+}}{\pi_{21}/\pi_{2+}}\)
Like \(\delta\), this is a function only of the parameters of \(P(Z  Y )\), so likelihoodbased inferences about \(\rho\) will be the same under Poisson, multinomial, or productmultinomial (\(n_{i+}\)s fixed) sampling.
Estimation and hypothesis testing
The natural (maximumlikelihood) estimate of \(\rho\) uses observed data:
\(\hat{\rho}=r=\dfrac{n_{11}/n_{1+}}{n_{21}/n_{2+}}\)
Because \(\rho\), and thus \(r\), are nonnegative, a normal approximation for \(\log r\) tends to work better than for \(r\) itself. So in this case, it makes sense to transform the data using the natural log (ln) which will allow us to use a less skewed distribution. The approximate variance of \(\log r\) is
\(V(\log r)\approx \dfrac{1\pi_{11}/\pi_{1+}}{n_{1+}\pi_{11}/\pi_{1+}}+\dfrac{1\pi_{21}/\pi_{2+}}{n_{2+}\pi_{21}/\pi_{2+}}\)
which is estimated by
\begin{align}
\hat{V}(\log r)&= \dfrac{1n_{11}/n_{1+}}{n_{1+}n_{11}/n_{1+}}+\dfrac{1n_{21}/n_{2+}}{n_{2+}n_{21}/n_{2+}}\\
&= \dfrac{1}{n_{11}}\dfrac{1}{n_{1+}}+\dfrac{1}{n_{21}}\dfrac{1}{n_{2+}}\\
\end{align}
An approximate 95% confidence interval for \(\log \rho\) is
\(\log r\pm 1.96\sqrt{\hat{V}(\log r)}\)
and the corresponding interval for \(\rho\) is found by exponentiating the endpoints.
Equivalence to statistical independence testing
Testing \(H_0 \colon \rho = 1\) is equivalent to testing for independence in a \(2 \times 2\) table. For our current example, \(r = \dfrac{0.12}{0.22}= 0.5523\) and has corresponding 95% CI \((0.3209, 0.9506)\). Thus, at 0.05 level, we can reject the independence model and furthermore can specifically say that the skiers who took vitamin C are less likely (nearly two times less likely) to get a cold in comparison with the skiers who did not take vitamin C. Notice that we could have computed \(\rho = \dfrac{0.22}{0.12} = 1.8\), which would lead to the same conclusion.
Keep in mind that relative risk is a better measure of association than the difference in proportions when cell probabilities are close to 0 and 1, i.e., when they are in the tails of the probability distribution. Relative risk has far more application in medical literature and epidemiology than difference of proportions.
Odds Ratio and Relative Risks  

Statistic  Value  95% Confidence Limits  
Odds Ratio  0.4900  0.2569  0.9343 
Relative Risk (Column 1)  0.5523  0.3209  0.9506 
Relative Risk (Column 2)  1.1273  1.0120  1.2558 
3.6  Odds Ratio
3.6  Odds RatioThis is perhaps the most commonly used measure of association. Later on, we will see this is a natural parameter for many of the loglinear and logistic models.
 Odds
 The odds are ratios of probabilities of "success" and "failure" for a given row, or a ratio of conditional probabilities of the same conditional distribution.
Odds of getting a cold versus not getting a cold given that a person took a placebo:
\(odds_1=\dfrac{P(Z=1Y=1)}{P(Z=2Y=1)}=\dfrac{\pi_{11}}{\pi_{21}}=\dfrac{\pi_{11}}{1\pi_{11}}\)
The second odds (given that ascorbic acid was taken),
\(odds_2=\dfrac{P(Z=1Y=2)}{P(Z=2Y=2)}=\dfrac{\pi_{12}}{\pi_{22}}=\dfrac{\pi_{12}}{1\pi_{12}}\)
Properties of odds
 If odds equal to 1, "success" and "failure" are equally likely.
 If odds > 1, then "success" is more likely than "failure".
 If odds < 1, then "success" is less likely than "failure".
 Odds Ratio

The odds ratio, is the ratio of odds_{1} and odds_{2} (or vice versa):
\begin{align}
\theta &= \dfrac{P(Z=1Y=1)/P(Z=2Y=1)}{P(Z=1Y=2)/P(Z=2Y=2)}\\
&= \dfrac{\left(\dfrac{\pi_{11}}{\pi_{1+}}\right)/\left(\dfrac{\pi_{12}}{\pi_{1+}}\right)}{\left(\dfrac{\pi_{21}}{\pi_{2+}}\right)/\left(\dfrac{\pi_{22}}{\pi_{2+}}\right)}\\
&= \dfrac{\pi_{11}\pi_{22}}{\pi_{12}\pi_{21}}\\
\end{align}
Clearly, \(\theta\) is a function of the parameters of \(P(Z  Y )\), so inferences about it should be the same under Poisson, multinomial, or productmultinomial (\(n_{i+}\)s fixed) sampling. But if we interchange the roles of \(Y\) and \(Z\), we still get
\(\theta=\dfrac{\pi_{11}\pi_{22}}{\pi_{12}\pi_{21}}\)
so \(\theta\) can also be regarded as a function of the parameters of \(P(Y Z )\). Therefore, the likelihood inferences will be the same if we regard the \(n_{+j}\)s as fixed.
Point estimate, CI and hypothesis test
The natural estimate of \(\theta\) is the sample crossproduct ratio,
\(\hat{\theta}=\dfrac{n_{11}n_{22}}{n_{12}n_{21}}\)
The properties of \(\hat{\theta}\) are easily established under multinomial sampling, but the same properties will hold under Poisson or productmultinomial sampling with either the row totals or column totals (but not both) regarded as fixed.
As with the relative risk, the logodds ratio \(\log\hat{\theta}\) has a better normal approximation than \(\hat{\theta}\) does. Therefore, we usually obtain a confidence interval on the log scale; please note again that log throughout this course is a natural log, that is log base \(e\). The estimated variance of \(\log\hat{\theta}\) is easy to remember,
\(\hat{V}(\log\hat{\theta})=\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}\)
and we get a 95% confidence interval for \(\theta\) by exponentiating the endpoints of
\(\log\hat{\theta} \pm 1.96\sqrt{\dfrac{1}{n_{11}}+\dfrac{1}{n_{12}}+\dfrac{1}{n_{21}}+\dfrac{1}{n_{22}}}\)
For the Vitamin C example, the odds of "success" (i.e., getting a cold), given that a skier took vitamin C, are \(0.12/0.88 = 0.14\). The odds of "success" (i.e., getting a cold), given that a skier took a placebo pill, are \(0.22/0.78 = 0.28\).
The odds ratio is \(0.14/0.28 = 0.49\), and the 95% CI for \(\log\theta\) would be
\(\log(0.490)\pm 1.96 \sqrt{1/17+1/109+1/122+1/31}=\)
\((1.359,0.068)\)
Finally, exponentiating limits gives us the 95% CI for \(\theta\): (0.256, 0.934). Notice, that we could have also computed \(0.28/0.14=2.04=31(122)/(109(17))\), which is the inverse of the above value we computed: \(1/0.49=2.04\). For our example, \(\hat{\theta}=0.49\) means that
 the odds of getting a cold given vitamin C are .49 times the odds of getting cold given a placebo
 the odds of getting a cold given a placebo are \(1/.49 = 2.04\) times greater than the odds of given vitamin C
 getting cold is less likely given vitamin C than given a placebo.
For computation in SAS, for the Vitamin C example compare the above calculations to relevant SAS output under heading "Statistics for Table of treatment and response: Odds Ratio";
Odds Ratio  

Odds Ratio  0.4900 
Asymptotic Conf Limits  
95% Lower Conf Limit  0.2569 
95% Upper Conf Limit  0.9343 
Exact Conf Limits  
95% Lower Conf Limit  0.2407 
95% Upper Conf Limit  0.9740 
tables treatment*response/ chisq relrisk riskdiff expected;
with
tables treatment*response/ chisq measures expected;
The computation in R is available with the VitaminC.R file.
For more on the interpretation of odds and oddsratios and their properties see below.
Properties of Odds Ratios
If \(\theta = 3\), the odds of "success" in row 1 are 3 times greater than the odds of success in row 2; individuals in row 1 are more likely to have a "success" than those in row 2. If \(\theta = 0.3\), the odds of "success" in row 1 are 0.3 times the odds of the row 2; the odds of "success" in row 2 are \((1/0.3) = 3.33\) times the odds in row 1.
The relationship between odds and probabilities can be expressed as
\(odds_1=\dfrac{\pi_{11}}{1\pi_{11}}\iff\pi_{11}=\dfrac{odds_1}{1+odds_1}\)
If the variables are independent, then \(\pi_{11} = \pi_{12}\), \(odds_1 = odds_2\), and
\(\theta=\frac{odds_1}{odds_2}=1\)
If the variables are not independent such that \(\pi_{11} > \pi_{12}\), then \(odds_1 > odds_2\), and
\( 1<\theta\)
If the variables are not independent such that \(\pi_{11} < \pi_{12}\), then \(odds_1< odds_2\), and
\( 0 < \theta < 1\)
If both \(\pi_{11}\) and \(\pi_{12}\) are small in the population, then the odds ratio and relative risk will be close since \(\frac{1\pi_{11}}{1\pi_{12}}\) will be close to 1. The odds ratio \(\theta\) does NOT depend on the marginal distribution of either variable. If the categories of both variables are interchanged, the value of \(\theta\) does not change. If the categories of one variable are switched, the odds ratio in the new rearranged table will equal \(1/\theta\).
Finally, note that the sample odds ratio will equal zero or \(\infty\) if any \(n_{ij}=0\). Some authors suggest adding \(1/2\) to each cell count and then recalculating the sample odds ratio and its standard error to avoid this issue.
3.7  Prospective and Retrospective Studies
3.7  Prospective and Retrospective StudiesIn epidemiology, three different types of studies are commonly done depending on whether the disease condition is first fixed and then the possible causes (exposure to a risk factor) are assessed or whether exposed and unexposed individuals are followed until the disease is developed. We will introduce two of those here.
Suppose variable \(Z\) represents a condition (disease) that is relatively rare in a population (e.g. lymphoma), and we want to assess whether another characteristic or behavior \(Y\) (e.g. smoking) could be a risk factor for \(Z\).
The obvious way to study this is to follow a group of smokers \((Y = 1)\) and a group of nonsmokers \((Y = 2)\) over time, and see which ones eventually develop lymphoma \((Z = 2)\) and which do not \((Z = 1)\). This is called a prospective study. The exposed and unexposed groups are determined at the start of the study and both groups are diseasefree. While it makes logical sense in determining a significant relationship, it can be very
 timeconsuming (we have to wait for a long time for the problem condition to develop)
 inefficient (we may need very large samples to obtain enough subjects with \(Z = 1)\)
An alternative is the retrospective study, in which we first locate a group of subjects with lymphoma \((Z = 1)\) and identify which are smokers and which are not. Here a diseased group is determined first and is retrospectively assessed for exposure status. Then we locate another group of subjects who are in some sense "comparable" but who do not have lymphoma \((Z = 2)\) and identify which are smokers and which are not. In the retrospective study, we have "sampled on the outcome," choosing individuals on the basis of \(Z\) and then observing \(Y\).
The interchangeability of \(Y\) and \(Z\) means that the usual roles of "response" and "explanatory" variables can be reversed, which could be extremely useful for research. Because the odds ratio is invariant to exchanging \(Y\) and \(Z\), the odds ratio from a retrospective study should be about the same as the odds ratio from a prospective study in which we sampled individuals according to their \(Y\) values and collected information on \(Z\). A retrospective study provides no information about the overall incidence of \(Z\) in the population because the proportions of cases with \(Z = 1\) and \(Z = 2\) were decided by the investigator. However, it does provide consistent estimates of the odds ratio indicating the effect of \(Y\) on \(Z\).
Example: Lung Cancer
The table below is adapted from Doll and Hill (1950), where 709 lung cancer sufferers were matched with 709 individuals without lung cancer to serve as a control. This is an example of a retrospective, casecontrol study. A study is called casecontrol when "cases" or diseased subjects and "controls" or comparable nondiseased subjects are sampled from respective populations and then assessed on their riskfactor exposure status.
Cancer Yes  Cancer No  Totals  

Smoking Yes  688  650  1338 
Smoking No  21  59  80 
Totals  709  709  1418 
Lung cancer is the natural response variable of interest, and we would like to condition on smoking to estimate conditional probabilities of cancer, given smoking status. But since the column (lung cancer frequencies) are fixed by design, each column is a separate binomial distributionnot each rowand the sample conditional probabilities based on row totals do not reflect the corresponding population proportions.
In other words, with retrospective studies, the sample sizes are fixed in a way that's counterintuitive to how we'd like to view the variables as explanatory and response. This also affects the interpretation of the relative risk because it's based on the same conditional probabilities. Fortunately, the odds ratio is numerically invariant, regardless of which totals (row or column) are fixed, which makes it an appropriate measure of association for both retrospective studies as well as prospective studies.
To see this invariance for the data above, we can calculate the sample odds ratio as
\(\displaystyle \hat{\theta}=\dfrac{688/650}{21/59}= \dfrac{688/21}{650/59}=2.97 \)
Thus, for this sample, the odds of lung cancer among smokers is 2.97 times the odds of lung cancer among nonsmokers. Equivalently, the odds of smoking among those with lung cancer is 2.97 times the odds of smoking among those without lung cancer.
Source: R. Doll and A. B. Hill, Br. Med. J., 739748, Sept. 30, 1950.
3.8  Measures of Associations in \(I \times J\) tables
3.8  Measures of Associations in \(I \times J\) tablesIn the Coronary Heart example, it is sensible to think of serum cholesterol as an explanatory variable and CHD as a response. Therefore, it would make sense to estimate the conditional probabilities of CHD within the four cholesterol groups. To do this, we simply divide each cell count \(n_{ij}\) by its column total \(n_{+j}\); the resulting proportion \(n_{ij}/n_{+j}\) is an estimate of \(P(Y = i Z = j)\). To see this, note that
\(P(Y=iZ=j)=\dfrac{P(Y=i,Z=j)}{P(Z=j)}\)
and is intuitively estimated by
\(\dfrac{n_{ij}/n_{++}}{n_{+j}/n_{++}}=\dfrac{n_{ij}}{n_{+j}}\).
These values correspond to "Col Pct" in the SAS output. In R, we need to calculate them based on the above formula, e.g., see HeartDisease.R. The result is shown below.
0199  200219  220259  260+  
CHD  12/319 = .038 
8/254 = .031 
31/470 = .066 
41/286 = .143 
no CHD  307/319 = .962 
246/254 = .969 
439/470 = .934 
245/286 = .857 
The risk of CHD appears to be essentially constant for the groups with cholesterol levels between 0–199 and 200–219. Although the estimated probability drops from .038 to .031, this drop is not statistically significant. We can test this by doing a test for the difference in proportions or by doing a chisquare test of independence for the relevant \(2 \times 2\) subtable:
0199  200219  
CHD  12/319 = .038 
8/254 = .031 
no CHD  307/319 = .962 
246/254 = .969 
The test yields a \(X^2 = 0.157\) with df=1, \(p\)value = .69. For the other two groups, however, the risk of CHD is substantially higher. We can do similar tests for other sets of cells. In fact, any two levels of cholesterol may be compared and tested for association between CHD and cholesterol level.
Describing associations in \(I \times J\) tables
In a \(2 \times 2\) table, the relationship between the two binary variables could be summarized by a single number (e.g., odds ratio). For an \(I \times J\) table, the usual \(X^2\) or \(G^2\) test for independence has \((IJ − 1) − (I − 1) − (J − 1) = (I − 1)(J − 1)\) degrees of freedom. This means that, with \(I > 2\) or \(J > 2\), there are multiple dimensions to the manner in which the data can depart from independence. The direction and magnitude of the departure from the null hypothesis can no longer be summarized by a single number, but must be summarized by \((I −1)(J −1)\) numbers of (i) difference in proportions, and/or (ii) relative risk, and/or (iii) odds ratios.
In the Coronary Heart Disease study, for example, we could summarize the relationship between CHD and cholesterol level by a set of three relative risks:
 200–219 versus 0–199,
 220–259 versus 0–199, and
 260+ versus 0–199.
That is, we could estimate the risk of CHD at each cholesterol level relative to a common baseline. Or, we could use
 200–219 versus 0–199,
 220–259 versus 200–219, and
 260+ versus 220–259,
This estimates the risk of each category relative to the category immediately below. Other comparisons are also possible, but they may not make sense in interpreting the data).
Example: Smoking Behaviors
The table below classifies 5375 high school students according to the smoking behavior of the student \(Z\) and the smoking behavior of the student’s parents \(Y\). We are interested in analyzing if there is a relationship of smoking behavior between the students and their parents?
How many parents smoke?  Student smokes?  

Yes (Z = 1)  No (Z = 2)  
Both (Y = 1)  400  1380 
One (Y = 2)  416  1823 
Neither (Y = 3)  188  1168 
The test for independence yields \(X^2 = 37.6\), and \(G^2 = 38.4\) with df = 2 (\(p\)values are essentially zero), so we have decided that \(Y\) and \(Z\) are related. It is natural to think of \(Z\) in this example as a response and \(Y\) as a predictor, so we will discuss the conditional distribution of \(Z\) given \(Y\). Let \(\pi_i = P(Z = 1Y = i)\), for \(i=1,2,3\). The estimates of these probabilities are
\(\hat{\pi}_1=400/1780=0.225\)
\(\hat{\pi}_2=416/2239=0.186\)
\(\hat{\pi}_3=188/1356=0.139\)
We can then compare these as risks associated with the parameters. The effect of \(Y\) on \(Z\) can be summarized with two differences. For example, we can calculate the increase in the probability of \(Z = 1\) as \(Y\) goes from 3 to 2, and as \(Y\) goes from 2 to 1:
\(\hat{d}_{23}=\hat{\pi}_2\hat{\pi}_3=0.047\)
\(\hat{d}_{12}=\hat{\pi}_1\hat{\pi}_2=0.039\)
Alternatively, we may treat \(Y = 3\) as a baseline and calculate the increase in probability as we go from \(Y = 3\) to \(Y = 2\) and from \(Y = 3\) to \(Y = 1\):
\(\hat{d}_{23}=\hat{\pi}_2\hat{\pi}_3=0.047\)
\(\hat{d}_{13}=\hat{\pi}_1\hat{\pi}_3=0.086\)
We may also express the effects as the sample odds ratios (e.g., look at any \(2\times 2\) table within this larger \(3 \times 2\) table):
\(\hat{\theta}_{23}=\dfrac{416\times 1168}{188\times 1823}=1.42\)
\(\hat{\theta}_{13}=\dfrac{400\times 1168}{188\times 1380}=1.80\)
The estimated value of 1.42 means that students with one smoking parent are estimated to be 42% more likely (on the odds scale) to smoke than students whose parents do not smoke (the last two rows of the table). The value of 1.80 means that students with two smoking parents are 80% more likely to smoke than students whose parents do not smoke (the first and the last rows of the table).
In a \(3 \times 2\) table, the relationship between the two variables must be summarized with two differences in proportions or two relative risks or two odds ratios. More generally, to describe the relationship between the two variables in an \(I × J\) table will require \((I − 1)(J − 1)\) numbers. You can specify a large number of different odds ratios depending on the size of the table, yet the minimum number of these ratios that efficiently describes the data is described as \((I  1)(J  1)\) number of ratios. There is a relationship between the minimum number of odds ratios and degrees of freedom for testing independence. Which odds ratios are most meaningful to the researcher depends on the research question at hand.
Besides the point estimates, we can also test hypotheses about the odds ratios or compute confidence intervals. You could do the same for the relative risks or difference in proportions as we discussed in previous sections. To do this computationally in SAS and/or R, we need to analyze each \(2\times2\) subtable separately. Basically, we treat each \(2\times2\) table as a "new" data set.
In SAS the OPTION ALL should give all possible measures; see smokeindep.sas (output, smokeindep SAS output). Depending which SAS version you are using the OPTIONS may be different, e.g., RELRISK, RRC1, RRC2, etc., and some of them work only for \(2\times2\) tables. For the current list, see the current SAS Support Documentation
In R, see smokeindep.R (output, smokeindep.out). The {vcd} package has a number of useful functions, e.g., oddsratio(), assocstats(); the latter will give \(X^2\), \(G^2\), and some other measures of associations, such as Cramer's V.
Statistical versus Practical Significance
In proposing measures of effect size, we need to realize that there is a difference between saying that an effect is statistically significant and saying that it is large.
A test statistic or pvalue is a measure of the evidence against a null hypothesis, and this evidence depends on the sample size. An effect size, however, should not change if n is arbitrarily increased.
In some situations, there may be an artificial dependency of statistical significance on sample size. If the sample size is small, and largesample goodnessoffit statistic is computed, the \(p\)value may not be the best statistic to depend upon because the largesample theory will not hold. Alternatively, if the sample size is very large you may obtain significant results where there really should not be one. Also, recall Type I and Type II errors of hypothesis testing.
The \(X^2\) and \(G^2\)^{ }test statistics are not appropriate measures of association between two variables. They are sufficient to test the null hypothesis, but not to describe the direction and magnitude of the association.
Here is a hypothetical example that will help to illustrate this point. First, consider the Vitamin C example again. The following table classifies a sample of French skiers by whether they got a cold, or not given a placebo or Vitamin C (ascorbic acid).
Cold  No Cold  Totals  

Placebo  31  109  140 
Ascorbic Acid  17  122  139 
Totals  48  231  279 
The test for independence yields \(X^2 = 4.814\) and the \(p\)value=0.0283. The conclusion here would be that there is strong evidence against the null hypothesis of independence. Now, let's suppose that we artificially inflate the sample size by multiplying each cell count by ten.
Cold  No Cold  Totals  

Placebo  310  1090  1400 
Ascorbic Acid  170  1220  1390 
Totals  480  2310  2790 
The cell proportions for this table remain identical to those of the previous table; the relationship between the two binary variables appears to be exactly the same. Yet, now the \(X^2\) statistic is \(10(4.811) = 48.11\). The new \(p\)value is close to 0, so the evidence against independence is now VERY strong to reject the nullnot because the relationship between the two variables is any stronger, but merely because the sample size has gone up.
Warning: A large \(p\)value is NOT strong evidence in favor of \(H_0\). A large \(p\)value can occur if (1) \(H_0\) is indeed true, or (2) \(H_0\) is false, but the test has low power.
Now, let's suppose that we artificially deflate the sample size by dividing each cell count by ten, and account for rounding.
Cold  No Cold  Totals  

Placebo  3  11  14 
Ascorbic Acid  2  12  14 
Totals  5  23  28 
The cell proportions for this table remain nearly identical to those of the previous tables; the relationship between the two binary variables appears to be the same. Yet, now the \(X^2\) statistic is 0.2435. The new \(p\)value is 0.6217 which gives little or no evidence against independence, but it does not tell us whether the weakness of association is due to (a) weak correlation between the two variables, or (b) the sample size being too small. Moreover, \(X^2\) says nothing about the direction of the possible effect, e.g. whether vitamin C takers are more likely or less likely to be sick than nontakers of vitamin C.
and Notes
The above analysis was implemented in VitaminC.sas (output file VitaminC SAS Output). The corresponding R code file is VitaminC.R and is commented so that results similar to those in SAS can be obtained.
3.9  Diagnostic Measures
3.9  Diagnostic MeasuresResiduals
Recall that residuals tell how far off are the expected and observed values for each cell, under the assumed model. They tell us which cells drive the lack of fit. We can check for Pearson and standardized residuals calculated under the null model, just as we did for oneway tables.
 Pearson Residual

The Pearson residual for a cell in a twoway table is
\(r_{ij}=\dfrac{O_{ij}E_{ij}}{\sqrt{E_{ij}}}\)
where the chisquared statistic then is: \(X^2=\sum_j\sum_i r^2_{ij}\)
\(r_{ij}\)’s have an approximate Normal distribution with mean 0, but their variances are not all equal! Typically their asymptotic variances are less than 1 and average variance equals \([(I − 1)(J − 1) / (\mbox{number of cells})]\).
 Standardized (adjusted) Pearson Residual

The standardized (adjusted) Pearson residual for a cell in a twoway table is
\(\dfrac{O_{ij}E_{ij}}{\sqrt{[E_{ij}(1p_{i+})(1p_{+j})]}}\)
A standardized Pearson residual has an approximate \(N(0,1)\) distribution. A value that exceeds 2 or 3 in absolute value, therefore, suggests a lack of fit. For the heart disease example data, the residual in the \((2,1)\) cell is
\(r_{12}=\dfrac{817.583}{17.583(1\frac{92}{1329})(1\frac{254}{1329})}=2.63\)
and would suggest some lack of fit of the independence model. It's also important to keep in mind, however, that the more cells involved, the more likely we are to observe an extreme residual by chance, even if the independence model holds.
In , SAS PROC FREQ the DEVIATION option gives the raw residuals (i.e., just the difference between the expected and observed values) and the CELLCHI2 option gives the squared Pearson residuals. Keep in mind that Pearson residuals are less variable than the standard normal variate; although notice that if the product of the marginal sample probabilities in the denominator is approximately equal to 1, that the adjusted Pearson residuals and the regular Pearson residuals are approximately equal.
The squared standardized Pearson residual values will have approximately chisquared distribution with df = 1; thus at a critical alpha value 0.05, a value of the squared standardized Pearson residuals greater than 4 (i.e., \(\chi^2(1, 0.05) = 3.84)\) will be considered significant (this can be used as a very crude cutoff for the squared Pearson residuals too). For other options in SAS explore the SAS documentation on PROC FREQ. For our example, see HeartDisease SAS Output(part of the output is below).
Here are the results from the Coronary Heart Disease example:
The FREQ Procedure


Conclusion
Notice the values in the third row of the first, second and the fourth cell, e.g., 4.60, 5.22, 22.70. These are squared Pearson residuals and much larger than 3.84, and they seem to be driving the lack of independence. [As an exercise, compute the standardized Pearson residuals and see if your inference would change.]
Let's further investigate the dependence structures in this table.
In R, chisq.test(your data)\$residuals gives the Pearson residuals. In our Heart Disease example, see result\$residuals and the corresponding output in HeartDisease.out.
Notice that if the product of the marginal sample probabilities in the denominator is approximately equal to 1, the adjusted Pearson residuals and the regular Pearson residuals are approximately equal. The squared standardized Pearson residual values will have approximately chisquared distribution with df = 1; thus at a critical alpha value 0.05, a value of the squared standardized Pearson residuals greater than 4 (i.e., \(\chi^2(1, 0.05) = 3.84\)) will be considered significant (this can be used as a very crude cutoff for the squared Pearson residuals too). A very crude cutoff for evaluating Pearson residuals, we can use the absolute value that exceeds 2 or 3. However, do keep in mind that Pearson residuals are less variable than the standard normal variate.
Partitioned Tests
Besides looking at the residuals or the measures of association, another way to describe the effects is to form a sequence of smaller tables by combining or collapsing rows and/or columns in a meaningful wayin other words, by looking into specific smaller tables within the larger table.
Partitioning chisquared uses the fact that the sum of independent chisquared statistics are themselves chisquared statistics with degrees of freedom equal to the sum of the degrees of freedom for the individual statistics. The reason why this works is that the multinomial distribution may be collapsed into multinomials and partitioned into productmultinomials. We start with a chisquared statistics with df > 1 and break it down into parts, such that each new statistic has df = 1. This partitioning helps to show that significant associations for the whole table are driven by differences between some subset of categories.
Typically (just as in odds ratios), for an \(I \times J\) table, there will be \((I − 1) \times (J − 1)\) partitions. In our example, we have \((31)(21) = 2\) parts. Let’s combine the first two rows:
Student smokes  Student doesn’t  

1–2 parents smoke  816  3203 
Neither parent smokes  188  1168 
This table has \(X^2 = 27.7\), \(G^2 = 29.1\), \(p\)value \(\approx 0\), and \(\hat{\theta}=1.58\). We estimate that a student is 58% more likely, on the odds scale, to smoke if he or she has at least one smoking parent.
We may now ask, "Among those students with at least one smoking parent, is there any difference between those with one smoking parent and those with two smoking parents?" Given that at least one parent smokes, is there any evidence that the other parent’s smoking affects the chances that the student will smoke?
To answer this, we discard the last row of the original table and look at the upper \(2 \times 2\) subtable.
Student smokes  Student doesn’t  

Both parents smoke  400  1380 
One parent smokes  416  1823 
This table has \(X^2 = 9.3\), \(G^2 = 9.2\), \(p\)value \(\approx .002\), and \(\hat{\theta}=1.27\). Given that at least one parent smokes, the fact that the other parent smokes does indeed raise the student’s probability of smoking; the effect, however, is not as large (\(\hat{\theta}=1.27\)) as it was in going from neither parent smoking to at least one parent smoking (\(\hat{\theta}=1.58\)).
Notice what happens if we add up the G^{2} values from these two \(2 \times 2\) tables:
\(29.1 + 9.2 = 38.3\)
The result is very close to 38.4, the value of \(G^2\) that we got for the full \(3 \times 2\) table. In fact, these two numbers should have come out exactly the same; the difference in the last decimal place was merely due to a rounding error. It is possible to show by theoretical results that partitions of \(G^2 \) do add up to the total \(G^2\). However, that is not true for the Pearson \(X^2\). The individual \(X^2\) values do not add up exactly to the overall \(X^2\), but they are pretty close:
\(27.7 + 9.3 = 37.0 \approx 37.6\).
When we analyze a \(3 \times 2\) table in this mannerby combining two rows into a single row, and then uncombining them againwe have partitioned the 2 degreeoffreedom test for independence into two single degreeoffreedom tests. By breaking up the test for independence into a sequence of tests for smaller tables, we can often identify precisely how the categorical variables may or may not be related. We compare these deviation statistics to investigate where the differences are coming from which could be very helpful when, for instance, you might be designing a future study and are exploring which parameters are important to include.
In practice, there are often many different ways to break up an \( I \times J\) table into a sequence of smaller tables. It is a good idea to do it in such a way that independence tests for each of the smaller tables pertain to a question that makes sense in the context of the individual problem, keeping the above rules in mind.
3.10  Lesson 3 Summary
3.10  Lesson 3 SummaryIn this lesson, we focused on the analysis of twoway tables. Beginning with the \(2 \times 2\) case we described the concept of independence for two discrete random variables and showed how to do the ChiSquare test of independence. Then, we discussed three different measures of associations, e.g., the difference in conditional proportions, relative risk and oddsratios, and their relations to the test of independence. We also saw how the residuals can be used to assess which cells, in particular, may have led to a significant result and how to apply an "exact" version of the independence test to tables of very small cell counts.
The concepts of independence, associations and marginal and conditional probabilities are very important for the analysis of categorical data. We will find the same concepts in more complex contingency tables, and throughout the course and methodology for analysis of categorical data. In the next lesson, we consider data that has a natural ordering and structure measures of association that take advantage of this.