10.2.2 - Complete Independence
10.2.2 - Complete IndependenceThis is the most restrictive model in that all variables are assumed to be jointly independent, regardless of any conditioning. Equivalently, this requires that each two and three-way distribution factors into the product of the marginal distributions involved.
Main assumptions
- The \(N = IJK\) counts in the cells are assumed to be independent observations of a Poisson random variable, and
- there are no partial interactions: \(\lambda_{ij}^{AB} =\lambda_{ik}^{AC} =\lambda_{jk}^{BC}=0\), for all \(i, j, k\), and
- there is no three-way interaction: \(\lambda_{ijk}^{ABC}=0\) for all \(i, j, k\).
Note the constraints above are in addition to the usual set-to-zero or sum-to-zero constraints (present even in the saturated model) imposed to avoid overparameterization.
Model Structure
\(\log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C\)
In SAS, the model of complete independence (D, S, A) can be fitted with the following commands:
proc genmod data=berkeley order=data;
class D S A;
model count = D S A / dist=poisson link=log;
run;
What are the estimated odds of male vs. female in this example? From the output, the ML estimate of the parameter S-Male, thus, the odds of being male are higher than being female applicant:
\(\exp(0.382) = 1.467 = 2691/1835\)
with p-value < .0001 indicating that the odds are significantly different. Note these are odds, not odds ratios! (When we are dealing with main effects we do not look at odds ratios.)
What about the odds of being rejected? What can we conclude from the part of the output below?
Analysis Of Maximum Likelihood Parameter Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
Wald 95% Confidence Limits | Wald Chi-Square | Pr > ChiSq | ||
Intercept | 1 | 4.7207 | 0.0455 | 4.6315 | 4.8100 | 10748.0 | <.0001 | |
D | DeptA | 1 | 0.2675 | 0.0497 | 0.1701 | 0.3650 | 28.95 | <.0001 |
D | DeptB | 1 | -0.1993 | 0.0558 | -0.3086 | -0.0900 | 12.77 | 0.0004 |
D | DeptC | 1 | 0.2513 | 0.0499 | 0.1535 | 0.3491 | 25.37 | <.0001 |
D | DeptD | 1 | 0.1037 | 0.0516 | 0.0025 | 0.2048 | 4.04 | 0.0445 |
D | DeptE | 1 | -0.2010 | 0.0558 | -0.3103 | -0.0916 | 12.98 | 0.0003 |
D | DeptF | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
S | Male | 1 | 0.3829 | 0.0303 | 0.3235 | 0.4422 | 159.93 | <.0001 |
S | Female | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
A | Reject | 1 | 0.4567 | 0.0305 | 0.3969 | 0.5165 | 224.15 | <.0001 |
A | Accept | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
Scale | 0 | 1.0000 | 0.0000 | 1.0000 | 1.0000 |
The scale parameter was held fixed.
But, we should really check the overall fit of the model first, to determine if these estimates are meaningful.
Model Fit
The goodness-of-fit statistics indicate that the model does not fit.
Criteria For Assessing Goodness Of Fit | |||
---|---|---|---|
Criterion | DF | Value | Value/DF |
Deviance | 16 | 2097.6712 | 131.1045 |
Scaled Deviance | 16 | 2097.6712 | 131.1045 |
Pearson Chi-Square | 16 | 2000.3281 | 125.0205 |
Scaled Pearson X2 | 16 | 2000.3281 | 125.0205 |
Log Likelihood | 19464.3700 | ||
Full Log Likelihood | -1128.3655 | ||
AIC (smaller is better) | 2272.7309 | ||
AICC (smaller is better) | 2282.3309 | ||
BIC (smaller is better) | 2282.1553 |
If the model fits well, the "Value/DF" would be close to 1. Recall how we get the degrees of freedom:
df = number of cells - number of fitted parameters in the model.
df = number of fitted parameters in the saturated model - number of fitted parameters in our model.
Recall that these goodness-of-fit statistics compare the fitted model to the saturated model. Thus, the model of complete independence does not fit well in comparison to the saturated model.
In R, the model of complete independence can be fit with the following commands:
berk.ind = glm(Freq~Admit+Gender+Dept, family=poisson(), data=berk.data)
What are the estimated odds of male vs. female in this example? From the output, the ML estimate of the parameter GenderMale, thus, the odds of being male are higher than being female applicant:
\(\exp(0.382) = 1.467 = 2691/1835\)
with p-value < 2e-16, indicating that the odds are significantly different. Note these are odds, not odds ratios! (When we are dealing with main effects we do not look at odds ratios.)
> summary(berk.ind)
Call:
glm(formula = Freq ~ Admit + Gender + Dept, family = poisson(),
data = berk.data)
Deviance Residuals:
Min 1Q Median 3Q Max
-18.170 -7.719 -1.008 4.734 17.153
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.72072 0.04553 103.673 < 2e-16 ***
AdmitRejected 0.45674 0.03051 14.972 < 2e-16 ***
GenderMale 0.38287 0.03027 12.647 < 2e-16 ***
DeptA 0.26752 0.04972 5.380 7.44e-08 ***
DeptB -0.19927 0.05577 -3.573 0.000352 ***
DeptC 0.25131 0.04990 5.036 4.74e-07 ***
DeptD 0.10368 0.05161 2.009 0.044533 *
DeptE -0.20098 0.05579 -3.602 0.000315 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2650.1 on 23 degrees of freedom
Residual deviance: 2097.7 on 16 degrees of freedom
AIC: 2272.7
What about the odds of being rejected? What can we conclude from the part of the output above?
But, we should really check the overall fit of the model first, to determine if these estimates are meaningful.
Model Fit
The reported "Residual deviance" of 2097.7 on 16 degrees of freedom indicates that the model does not fit. If the model fits well, the "Value/DF" would be close to 1. Recall how we get the degrees of freedom:
df = number of cells - number of fitted parameters in the model.
df = number of fitted parameters in the saturated model - number of fitted parameters in our model.
Recall that this goodness-of-fit statistic compares the fitted model to the saturated model. Thus, the model of complete independence does not fit well in comparison to the saturated model.
Next, let us see an example of the joint independence model.