5.4.1  Mutual (Complete) Independence
The simplest model that one might propose is that ALL variables are independent of one another.
Graphically, if we have three random variables, A, B, and C, we can express this model as:
In this graph, the lack of connections between the nodes indicates no relationships exist among A, B, and C, or there is mutual independence. In the notation of loglinear models, which will learn about later as well, this model is expressed as (A, B, C). We will use this notation from here on. The notation indicates which aspects of the data we consider in the model (which sufficient statistics we compute). In this case, the independence model means we only care about the marginals totals $n_{i++}, n_{+j+}, n_{++k}$ for each variable separately, and these pieces of information will be sufficient to fit a model and compute the expected counts. Alternatively, as we will see later on, we could keep track of the number of times some joint outcome of two variables occurs.
In terms of odds ratios, the model (A, B, C) implies that if we look at the marginal tables A × B, B × C , and A × C , then all of the odds ratios in these marginal tables are equal to 1. In other words, mutual independence implies marginal independence (i.e. there is independence in the marginal tables). All variables are really independent of one another.
Under this model the following must hold:
\(P(A=i,B=j,C=k)=P(A=i)P(B=j)P(C=k)\)
for all i, j, k. That is, the joint probabilities are the product of the marginal probabilities. This is a simple extension from the model of independence in twoway tables where it was assumed:
\(P(A=i,B=j)=P(A=i)P(B=j)\)
Define the marginal probabilities,
π_{i++} = P(A = i), i = 1, 2, . . . , I ,
π_{+j+} = P(B = j), j = 1, 2, . . . , J,
π_{++k} = P(C = k), k = 1, 2, . . . , K .
so that π_{ijk} = π_{i++}π_{+j+}π_{++k} for all i, j, k.
Then the unknown parameters of the model of independence are
π_{i++} = (π_{1++}, π_{2++}, . . . , π_{I++}),
π_{+j+} = (π_{+1+}, π_{+2+}, . . . , π_{+J+}),
π_{++k} = (π_{++1}, π_{++2}, . . . , π_{++K}).
Under the assumption that the model of independence is true, once we know the marginal probability values, we have enough information to estimate all unknown cell probabilities. Because each of the marginal probability vectors must add up to one, the number of free parameters in the model is (I − 1) + (J − 1) + (K −I ). This is exactly like the twoway table, but now one more set of additional parameter(s) need to be taken care of for the additional random variable. Consider the Death Penalty example where the number of free parameters is (21) + (21) + (21) = 3.
Notice that under the independence model, marginal distributions are
(n_{1++}, n_{2++}, . . . , n_{I++}) ∼ Mult(n, π_{i++}),
(n_{+1+}, n_{+2+}, . . . , n_{+J+}) ∼ Mult(n, π_{+j+}),
(n_{++1}, n_{++2}, . . . , n_{++K}) ∼ Mult(n, π_{++k}),
and these three vectors are mutually independent. Thus the three parameter vectors π_{i++}, π_{+j+}, and π_{++k} can be estimated independently of one another. The ML estimates are the sample proportions in the margins of the table,
\(\hat{\pi}_{i++}=p_{i++}=n_{i++}/n,\quad i=1,2,\ldots,I\)
\(\hat{\pi}_{+j+}=p_{+j+}=n_{+j+}/n,\quad j=1,2,\ldots,J\)
\(\hat{\pi}_{++k}=p_{++k}=n_{++k}/n,\quad k=1,2,\ldots,K\)
It then follows that the estimates of the expected cell counts are
\(E_{ijk}=n\hat{\pi}_{i++}\hat{\pi}_{+j+}\hat{\pi}_{++k}=\dfrac{n_{i++}n_{+j+}n_{++k}}{n^2}\)
Again, compare this to a twoway table, where the expected counts were: \(E(n_{ij})=n_{i+}n_{+j}/n\).
For the death penalty example, the marginal tables, i.e., counts are: A=[160,166], B=[214, 112], C=[36,290]. The E(n_{111})=(160 × 214 × 36)/(326^{2})=11.60, etc... Then compare these expected counts with the corresponding observed counts, e.g., 11.60 to n_{111}=19, etc...
ChiSquared Test of Independence
The hypothesis of independence can be tested using the general method described earlier in Lesson 3 (and 2). To test
H_{0} : the independence model is true vs. H_{A} : the saturated model is true
In other words, we can check directly H_{0}: π_{ijk} = π_{i++}π_{+j+}π_{++k} for all i, j, k, vs. H_{A}: the saturated model
 Estimate the unknown parameters of the independence model, e.g., the marginal probabilities.
 Calculate estimated cell probabilities and expected cell frequencies E_{ijk }under the model of independence._{}
 Calculate X^{2}and/or G^{2} by comparing the expected and observed values, and compare them to the appropriate chisquare distribution.
\(X^2=\sum\limits_i \sum\limits_j \sum\limits_k \dfrac{(E_{ijk}n_{ijk})^2}{E_{ijk}}\)
\(G^2=2\sum\limits_i \sum\limits_j \sum\limits_k n_{ijk} \text{log }\left(\dfrac{n_{ijk}}{E_{ijk}}\right)\)
The degrees of freedom (DF) for this test are ν = (IJK − 1) − [ (I − 1) + (J − 1) + (K − 1) ]. As before this is a difference between the number of free parameters for the saturated model (IJK1) and the number free parameters in the current model of independence, \((I1)+(J1)+(K1)\).
For example, for the death penalty example, DF = 73 = 4.
Recall that we also said that mutual independence implies marginal independence. So, if we reject marginal independence for any pair of variables, we can immediately reject mutual independence overall. For example, consider the estimated marginal odds ratios and their confidence intervals for death penalty example (see death.sas (or death.R) and death.lst (or death.out) in the previous section); the estimates are θ_{AC} =1.18, θ_{AB}=27.43, θ_{BC}=2.88, each for a 2 × 2 table marginal table, with df = 1.
What is your conclusion? Would you reject the model of complete independence? Are these three variables mutually independent? 
Example  Boys Scouts and Juvenile Delinquency
This is a 3 × 2 × 2 table. It classifies n = 800 boys according to socioeconomic status (S), whether they are a boy scout (B), and whether they have been labeled as a juvenile delinquent (D):
Socioeconomic status

Boy scout

Delinquent


Yes

No


Low 
Yes

11

43

No

42

169


Medium 
Yes

14

104

No

20

132


High 
Yes

8

196

No

2

59

To fit the full independence model, we need to find the marginal totals for B,
n_{1++} = 11 + 43 + 14 + 104 + 8 + 196 = 376,
n_{2++} = 42 + 169 + 20 + 132 + 2 + 59 = 424,
for D,
n_{+1+} = 11 + 42 + 14 + 20 + 8 + 2 = 97,
n_{+2+} = 43 + 169 + 104 + 132 + 196 + 59 = 703,
and for S,
n_{++1} = 11 + 43 + 42 + 169 = 265,
n_{++2} = 14 + 104 + 20 + 132 = 270,
n_{++3} = 8 + 196 + 2 + 59 = 265.
Calculate the expected counts for each cell, \(E_{ijk}=\dfrac{n_{i++}n_{+j+}n_{++k}}{n^2}\) and then calculate the chisquare statistics.
The degrees of freedom for this test are \((2\times 2\times 31)[(21)+(21)+(31)]=7\) so pvalues can be found as \(P(\chi^2_7 \geq X^2)\) and \(P(\chi^2_7 \geq G^2)\).
Recall, here is a simple lne of code in SAS that you can use to get the pvalue:
To get them in R, use 1pchisq(218.6622, 7).
The pvalues are essentially zero, indicating that the mutual independence model does not fit. Remember, in order for the chisquared approximation to work well, the E_{ijk} needs to be sufficiently large. Sufficiently large means that most of them (e.g., about at least 80%) should be at least five, and none should be less than one. We should examine the E_{ijk} to see if they are large enough.
As you can find by running the provided codes below, both R and SAS will give you the following expected counts, (in parentheses), and the observed counts:
Socioeconomic status

Boy scout

Delinquent


Yes

No


Low

Yes

11
(15.102) 
43
(109.448) 
No

42
(17.030) 
169
(123.420) 

Medium

Yes

14
(15.387 ) 
104 
No

20
(17.351) 
132
(125.749) 

High

Yes

8 
196
(109.448) 
No

2
(17.030) 
59 
Here, the expected counts are sufficiently large for the chisquare approximation to work well, and thus we must conclude that the variables B (boys scout), D (delinquent), and S (socioeconomic status) are not mutually independent.
Note: Most software packages should give you a warning if more than 20% of the expected cells are less than 5, and this may have influence on large sample approximations.
There is no single function or a call in SAS nor R that will directly test the mutual independence model; see will see in Lesson 10 how to fit this model via loglinear model. However, we can test this by relying on our understanding of twoway tables, and of marginal and partial tables and related odds ratios. For the mutual independence to hold, all of the tests for independence in marginal tables must hold. Thus, we can do the analysis of all twoway marginal tables (see the SAS and R code). We can do the chisquared test of independence in each twoway table. Alternatively, we can consider the oddsratios in each twoway table. In this case, for example, the estimated odds ratio for the B × D table, is 0.542, and it is not equal to 1; i.e., 1 is not in covered by the 95% oddsratio confidence interval, (0.347, 0.845). Therefore, we have sufficient evidence to reject the null hypothesis that boy scout status and delinquent status are independent of one another, and thus that B, D, and S are not mutually independent.
The following SAS or R code supports the above analysis by testing independence of twoway marginal tables. Again, we will see later in the course that this is done more efficiently via loglinear models.
If two or more variables in a kway table are not independent, then where is this difference coming from? That is, what are some other possible relationships that hold? What are some other models that can capture this data? May be S and B are jointly independent of D?