6.3 - Correspondence to k-way tables

6.3 - Correspondence to k-way tables

Just as three-way tables allowed us to fix or condition on one variable while focusing on the relationship between two others, the logistic regression model can easily accommodate more than a single predictor. Each slope coefficient is a measure of the conditional relationship between its predictor and the response, given the other predictors are held constant. In other words, all relationships are conditional ones.

We start with the familiar example of the boy scouts and rates of delinquency and see how the results we calculated from the three-way table earlier (and more) can be produced from the logistic regression model.


6.3.1 - Estimating Odds Ratios

6.3.1 - Estimating Odds Ratios

Recall the \(3\times2\times2\) table that we examined in Lesson 5 that classifies 800 boys according to S = socioeconomic status, B = whether a boy is a scout, and D = juvenile delinquency status.

Socioeconomic status Boy scout Delinquent
Yes No
Low Yes 11 43
No 42 169
Medium Yes 14 104
No 20 132
High Yes 8 196
No 2 59

Because the outcome variable D is binary, we can express many models of interest using binary logistic regression. Before handling the full three-way table, let us consider the \(2\times2\) marginal table for B and D as we did in Lesson 5. We concluded that the boy scout status (B) and the delinquent status (D) are associated and that (from the table below)

Boy scout Delinquent
Yes No
Yes 33 343
No 64 360

the estimated log-odds ratio is

\(\log\left(\dfrac{33( 360)}{64 (343)}\right)=-0.6140\)

with a standard error of \(\sqrt{\dfrac{1}{33}+\dfrac{1}{343}+\dfrac{1}{64}+\dfrac{1}{360}}=0.2272\). That is, we estimate that being a boy scout is associated a lower log-odds of delinquency by 0.614; the odds-ratio is 0.541. Now let’s fit the logistic regression model

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x\)

where \(x\) is a dummy variable

\(x\) = 0 if non-scout,
\(x\) = 1 if scout.

See the SAS code in the program scout.sas below:

options nocenter nodate nonumber linesize=72;

data scout1;
   input S $ y n;
   cards;
scout     33 376
nonscout  64 424
;

proc logist data=scout1;
  class S / param=ref ref=first;
  model y/n = S / scale=none;
  run;

Note that SAS will set the baseline to "non-scout" because it comes before "scout" in alphabetical order. Some output from this part of the program:

 
Analysis of Maximum Likelihood Estimates
Parameter   DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept   1 -1.7272 0.1357 162.1110 <.0001
S scout 1 -0.6140 0.2272 7.3032 0.0069

The estimated coefficient of \(x\)

\(\hat{\beta}_1=-0.6140\)

is identical to the log-odds ratio from the analysis of the \(2\times2\) table. The standard error for \(\hat{\beta}_1\), 0.2272, is identical to the standard error that came from the \(2\times2\) table analysis. Also, in this model, \(\beta_0\) is the log-odds of delinquency for non-scouts (\(x=0\)). Looking at the \(2\times2\) table, the estimated log-odds for non-scouts is

\(\log\left(\dfrac{64}{360}\right)=-1.7272\)

which agrees with \(\hat{\beta}_0\) from the logistic model.

The goodness-of-fit statistics \(X^2\) and \(G^2\) from this model are both zeroes because the model is saturated. Let's fit the intercept-only model by removing the predictor from the model statement:

model y/n = / scale=none;

The goodness-of-fit statistics are shown below.

 
Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 7.6126 1 7.6126 0.0058
Pearson 7.4652 1 7.4652 0.0063

Number of events/trials observations: 2

The Pearson statistic \(X^2=7.4652\) is identical to the ordinary chi-square statistic for testing independence in the \(2\times2\) table. And the deviance \(G^2= 7.6126\) is identical to the \(G^2\) for testing independence in the \(2\times2\) table.

The R code for this model and other models in this section for the boy scout example is in the R program scout.R. Here is part of the code that works with the \(2\times2\) (marginal) table between scouting and delinquency.

#### corresponds to the data scout1 in scout.SAS
#### Fits a Logistic regression with S="scout" or "nonscout"

S=factor(c("scout","nonscout"))
Sscout=(S=="scout")
Snonscout=(S=="nonscout")
y=c(33,64)
n=c(376,424)
count=cbind(y,n-y)
result=glm(count~Sscout+Snonscout,family=binomial("logit"))
summary(result)

Part of the R output includes:

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -1.7272     0.1357 -12.732  < 2e-16 ***
SscoutTRUE     -0.6140     0.2272  -2.702  0.00688 ** 
SnonscoutTRUE       NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note that R sets the baseline to "non-scout" because it comes before "scout" in alphabetical order. The estimated coefficient for scout is

\(\hat{\beta}_1=−0.6140\)

and is identical to the log-odds ratio from the analysis of the \(2\times2\) table. The standard error for \(\hat{\beta}_1\), 0.2272, is identical to the standard error that came from the \(2\times2\) table analysis.

Also, in this model, \(\beta_0\) is the log-odds of delinquency for nonscouts (\(x_i=0\)). Looking at the \(2\times2\) table, the estimated log-odds for non-scouts is

\(\log(64/360) = −1.7272\)

which agrees with \(\hat{\beta}_0\) from the logistic model.

The goodness-of-fit statistics are shown below:

   Null deviance: 7.6126e+00  on 1  degrees of freedom
Residual deviance: 4.0190e-14  on 0  degrees of freedom
AIC: 15.083

As we have seen before, the residual deviance is essentially 0 because the model is saturated. The null deviance corresponds to the deviance goodness-of-fit statistics in the SAS output. Here, the deviance \(G^2 = 7.6126\) is identical to the \(G^2\) for testing independence in the \(2\times2\) table.

NOTE! We have shown again that analyzing a \(2\times2\) table for association is equivalent to logistic regression with an indicator variable. Next, let us look at the rest of the data and generalize these analyses to \(I\times 2\) tables and \(I\times J\times 2\) tables.

Now let us do a similar analysis for the \(3\times2\) table that classifies subjects by S and D:

Socioeconomic status Delinquent
Yes No
Low 53 212
Medium 34 236
High 10 255

Two odds ratios of interest are

\((53(236)) / (34( 212)) = 1.735\)
\((53( 255)) / (10 (212)) = 6.375\)

We estimate that the odds of delinquency for the S = low group are 1.735 times as high as for the S = medium group, and 6.375 times as high as for the S = high group. The estimated log odds ratios are

\(\log 1.735 = .5512\) and \(\log 6.375 = 1.852\)

and the standard errors are

\(\sqrt{\dfrac{1}{53}+\dfrac{1}{212}+\dfrac{1}{34}+\dfrac{1}{236}}=0.2392\)

\(\sqrt{\dfrac{1}{53}+\dfrac{1}{212}+\dfrac{1}{10}+\dfrac{1}{255}}=0.3571\)

Now let us replicate this analysis using logistic regression. First, we re-express the data in terms of \y_i\) = number of delinquents and \(n_i\) = number of boys for the \(i\)th row (SES group):

  \(y_i\) \(n_i\)
Low 53 265
Medium 34 270
High 10 265

Then we define a pair of dummy indicators,

\(x_1=1\) if S=medium,
\(x_1=0\) otherwise,

\(x_2=1\) if S=high,
\(x_2=0\) otherwise.

Let \(\pi\) = probability of delinquency. Then, the model

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x_1+\beta_2 x_2\)

says that the log-odds of delinquency are \(\beta_0\) for S = low, \(\beta_0+\beta_1\) for S = medium, and \(\beta_0+\beta_2\) for S = high.

The SAS code for fitting this model is shown below (see scout.sas).

options nocenter nodate nonumber linesize="72";
data new;
	input S $ y n;
    cardsl
low		53 265
medium		34 270
hight		10 265
;

proc logist data=new;
class S / order data param=ref ref=first;
model y/n = S scale=none;
run;

Some relevant portions of the output are shown below.

 
Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 0.0000 0 . .
Pearson 0.0000 0 . .

Number of events/trials observations: 3

Think about the following question, then click on the button to show the answer.

 Stop and Think!
Why are \(G^2\) and \(X^2= 0\)? What happened with information on boy scout status?

Since we are fitting the saturated model here, we are using all three lines of data. So, our current model IS the saturated model. 

So, in comparing the current model with the saturated model, the deviance has to be zero.

As far the information about boy scout status, we have collapsed over this status. This is similar to when we collapsed down from a 3-way table down into a 2-way table and their is no information about boy scout status.

 
Model Fit Statistics
Criterion Intercept Only Intercept and Covariates
Log Likelihood Full Log Likelihood
AIC 593.053 560.801 20.942
SC 597.738 574.855 34.995
-2 Log L 591.053 554.801 14.942
 
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 36.2523 2 <.0001
Score 32.8263 2 <.0001
Wald 27.7335 2 <.0001
 
Type 3 Analysis of Effects
Effect DF Wald
Chi-Square
Pr > ChiSq
S 2 27.7335 <.0001
 
Analysis of Maximum Likelihood Estimates
Parameter   DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept   1 -1.3863 0.1536 81.4848 <.0001
S medium 1 -0.5512 0.2392 5.3080 0.0212
S high 1 -1.8524 0.3571 26.9110 <.0001
 
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
S medium vs low 0.576 0.361 0.921
S high vs low 0.157 0.078 0.316

In this case, the "intercept only" model says that delinquency is unrelated to socioeconomic status, so the test of the global null hypothesis \(\beta_1=\beta_2=0\) is equivalent to the usual test for independence in the \(3\times2\) table. The estimated coefficients and SEs are as we predicted, and the estimated odds ratios are

\(\exp(−.5512) = 0.576 = 1/1.735\),
\(\exp(−1.852) = 0.157 = 1/6.375\)

Here is how we can fit this model in R (from scout.R, corresponds to the scout2 data in the scout.sas program).

S=factor(c("low","medium","high"))
y=c(53,34,10)
n=c(265,270,265)
count=cbind(y,n-y)
Smedium=(S=="medium")
Shigh=(S=="high")
result=glm(count~Smedium+Shigh,family=binomial("logit"))
summary(result)

Notice that we only use “Smedium” and “Shigh” in the model statement in glm(). So we set the baseline as "low" for this model.

R output:

Call:
glm(formula = count ~ Smedium + Shigh, family = binomial("logit"))

Deviance Residuals: 
[1]  0  0  0

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.3863     0.1536  -9.027  < 2e-16 ***
SmediumTRUE  -0.5512     0.2392  -2.304   0.0212 *  
ShighTRUE    -1.8524     0.3571  -5.188 2.13e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3.6252e+01  on 2  degrees of freedom
Residual deviance: 6.8390e-14  on 0  degrees of freedom
AIC: 20.942

Number of Fisher Scoring iterations: 3
 Stop and Think!
As happened in SAS, we get the residual deviance of almost zero in this case. Why? What happened with information on boy scout status?

Since we are fitting the saturated model here, we are using all three lines of data. So, our current model IS the saturated model.

So, in comparing the current model with the saturated model, the deviance has to be zero.

As far the information about boy scout status, we have collapsed over this status. This is similar to when we collapsed down from a 3-way table down into a 2-way table and their is no information about boy scout status.

The null deviance is the \(G^2\) that corresponds to the deviance goodness-of-fit statistics found in the SAS output. Here, the deviance \(G^2 = 36.252\), so we can conclude that the delinquency is related to socioeconomic status. The test of the global null hypothesis \(\beta_1=\beta_2=0\) is equivalent to the usual test for independence in the \(3\times2\) table. The estimated coefficients and SEs are as we predicted, and the estimated odds ratios are

\(\exp(−.5512) = 0.576 = 1/1.735\),
\(\exp(−1.852) = 0.157 = 1/6.375\).

Notice that we did not say anything here about the scout status. We have "ignored" that information because we collapsed over that variable. Next, we will see how this plays out with logistic regression.


6.3.2 - Collapsing Tables

6.3.2 - Collapsing Tables

In this example, we could have also arranged the input data like this:

S B \(y_i\) \(n_i\) 
low scout 11 54
low non-scout 42 211
medium scout 14 118
medium non-scout 20 152
high scout 8 204
high non-scout 2 61

A SAS program for fitting the same model is shown below.

data new;
   input S $ B $ y n;
   cards;
low      scout      11   54
low      nonscout   42  211
medium   scout      14  118
medium   nonscout   20  152
high     scout       8  204
high     nonscout    2   61
;

proc logist data=new;
  class S / order=data param=ref ref=first;
  model y/n = S / scale=none;
  run;

The parameter estimates from this new program are exactly the same as before:

 
Analysis of Maximum Likelihood Estimates
Parameter   DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept   1 -1.3863 0.1536 81.4848 <.0001
S medium 1 -0.5512 0.2392 5.3080 0.0212
S high 1 -1.8524 0.3571 26.9110 <.0001

But the overall fit statistics are different! Before, we had \(X^2=0\) and \(G^2=0\) because the model was saturated (there were three parameters and \(N = 3\) lines of data). But now, the fit statistics are:

 
Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 0.1623 3 0.0541 0.9834
Pearson 0.1602 3 0.0534 0.9837

Number of events/trials observations: 6

The model appears to fit very well, but it is no longer saturated. What happened? Recall that \(X^2\) and \(G^2\) are testing the null hypothesis that the current model is correct, versus the alternative of a saturated model. When we disaggregated the data by levels of B, using six input lines rather than three, the current model did not change but the saturated model did; the saturated model was enlarged to six parameters.

Note!

It is very important for you to understand how you entered the data and what model you are fitting. If you understand the basic concepts, then you can apply model comparisons with any statistical software application.

Another way to interpret the overall \(X^2\) and \(G^2\) goodness-of-fit tests is that they are testing the significance of all omitted covariates. If we collapse the data over B and use only three lines of data, then SAS is unaware of the existence of B. But if we disaggregate the data by levels of B and do not include it in the model, then SAS has the opportunity to test the fit of the current model—in which the probability of delinquency varies by S alone—against the saturated alternative in which the probability of delinquency varies by each combination of the levels of S and B. When the data are disaggregated, the goodness-of-fit tests are actually testing the hypothesis that D is unrelated to B once S has been taken into account—i.e., that D and B are conditionally independent given S.

Here’s another way to think about it. The current model has three parameters:

  • an intercept, and
  • two indicators for S.

But the alternative has six parameters. We can think of these six parameters as an intercept and five dummies to distinguish among the six rows of data, but then it's not easy to see how the current model becomes a special case of it. So, we think of the six parameters as

  • an intercept,
  • two dummies for S,
  • one dummy for B, and
  • two interaction terms for SB.

Now it has become clear that the current model is a special case of this model in which the coefficients for B and the SB interactions are zero. The overall \(X^2\) and \(G^2\) statistics for the disaggregated data are testing the joint significance of the B dummy and the SB interactions.

So, should we aggregate, or should we not? If the current model is true, then it doesn’t matter; we get exactly the same estimated coefficients and standard errors either way. But dis-aggregating gives us the opportunity to test the significance of the omitted terms for B and SB.

Therefore, it often makes sense to dis-aggregate your dataset by variables that are not included in the model, because it gives you the opportunity to test the overall fit of your model. But that strategy has limits. If you dis-aggregate the data too much, the \(n_i\)s may become too small to reliably test the fit by \(X^2\) and \(G^2\).


6.3.3 - Different Logistic Regression Models for Three-way Tables

6.3.3 - Different Logistic Regression Models for Three-way Tables

In this part of the lesson we will consider different binary logistic regression models for three-way tables and their link to log-linear models. Let us return to the \(3\times2\times2\) table:

Socioeconomic status Boy scout Delinquent
Yes No
Low Yes 11 43
No 42 169
Medium Yes 14 104
No 20 132
High Yes 8 196
No 2 59

As we discussed in Lesson 5, there are many different models that we could fit to this table. But if we think of D as a response and B and S as potential predictors, we can focus on a subset of models that make sense.

Let \(\pi\) be the probability of delinquency. The simplest model to consider is the null or intercept-only model,

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0\) (1)

in which D is unrelated to B or S. If we were to fit this model in PROC LOGISTIC using the disaggregated data (all six lines), we would find that the\(X^2\) and \(G^2\) statistics are identical to those we obtained in Lesson 5 from testing the null hypothesis "S and B are independent of D". That is, testing the overall fit of model (1), i.e., intercept only model, is equivalent to testing the fit of the log-linear model (D, SB), because (1) says that D is unrelated to S and B but makes no assumptions about whether S and B are related.

After (1), we may want to fit the logit model that has a main effect for B,

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x_1\) (2)

where

\(x_1=1\) if B=scout,
\(x_1=0\) otherwise.

If the data are disaggregated into six lines, the goodness-of-fit tests for model (2) will be equivalent to the test for (DB, SB) log-linear model, which says that D and S are conditionally independent given B. This makes sense, because (2) says that S has no effect on D once B has been taken into account.

The model that has main effects for S,

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_2 x_2+\beta_3 x_3\) (3)

where

\(x_2=1\) if S = medium,
\(x_2=0\) otherwise,

\(x_3=1\) if S = high,
\(x_2=0\) otherwise,

says that B has no effect on D once S has been taken into account. The goodness-of-fit tests for (3) are equivalent to testing the null hypothesis that (DS, BS) fits, i.e. that D and B are conditionally independent given S.

The logit model

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3\) (4)

has main effects for both B and S, corresponds to the model of homogeneous association which we discussed in Lesson 5. We could not fit the model at that time, because the ML estimates have no closed-form solution, but we calculated CMH statistic and Breslow-Day statistic. But with logistic regression software, fitting this model is no more difficult than for any other model.

This model says that the effect of B on D, when expressed in terms of odds ratios, is identical across the levels of S. Equivalently, it says that the odds ratios describing the relationship between S and D are identical across the levels of B. If this model does not fit, we have evidence that the effect of B on D varies across the levels of S, or that the effect of S on D varies across the levels of B.

Finally, the saturated model can be written as

\(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3x_3+\beta_4x_1x_2+\beta_5 x_1x_3\)

which has the main effects for B and S and their interactions. This model has \(X^2=G^2=0\) with zero degrees of freedom (see scout.sas).


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility