Example: Housing Data (using SAS) Section
In this section, we will fit the baselinecategory logit model to the data below via PROC LOGISTIC with LINK=GLOGIT in SAS. The model can be also fitted by using PROC GENMOD (see the reference link provided at the top page of this lesson).
The data in "housing.dat" gives information on housing conditions from a survey in Copenhagen. Variables are satisfaction (low, medium, high), perceived influence on management (low, medium, high), type of housing (tower, atrium, apartment, terrace), contact with other residents (low, high), and the frequency count of individuals in that row.
When the data are grouped, as they are in this example, SAS expects the response categories \(1, 2, \ldots , r\) to appear in a single column of the dataset, with another column containing the frequency or count as they are in this case. The lines that have a frequency of zero are not actually used in the modeling, because they contribute nothing to the loglikelihood; as such, we can omit them if we wish.
Let's describe these data by a baselinecategory model, with Satisfaction as the outcome (baseline chosen as "medium") and other variables are predictors. Note that with this model, we treat the satisfaction levels as nominal; we'll revisit this data set with satisfaction treated as ordinal in the next section.
We let \(\pi_1\) = probability of low satisfaction, \(\pi_2\) = probability of high satisfaction, and \(\pi_3\) = probability of medium satisfaction so that the equations are
\(\log\left(\dfrac{\pi_j}{\pi_3}\right)=\beta_{0j}+\beta_{1j}x_1+\beta_{2j}x_2+\cdots\)
for \(j = 1,2\). The \(x\)'s represent the predictor terms, including any interactions we wish to fit as well. Note that these predictor terms do not depend on \(j\). Also, treating the predictors as nominal requires
 two indicators for perceived influence on management
 three indicators for type of housing
 one indicator for contact with other residents
With firstorder terms and all twoway interactions, this gives us a total of \((31)(1+6+11)=36\) parameters (1 intercept, 6 firstorder terms, and 11 interaction terms for each of the two nonbaseline satisfaction response categories)!
Specifying the Model
To get fit statistics, we include the options aggregate and scale=none. To adjust for overdispersion, we would use scale=pearson.
data housing;
input Sat $ Infl $ Type $ Cont $ Freq;
cards;
Low Low Tower Low 21
Medium Low Tower Low 21
High Low Tower Low 28
Low Medium Tower Low 34
Medium Medium Tower Low 22
High Medium Tower Low 36
Low High Tower Low 10
Medium High Tower Low 11
High High Tower Low 36
Low Low Apartment Low 61
Medium Low Apartment Low 23
High Low Apartment Low 17
Low Medium Apartment Low 43
Medium Medium Apartment Low 35
High Medium Apartment Low 40
Low High Apartment Low 26
Medium High Apartment Low 18
High High Apartment Low 54
Low Low Atrium Low 13
Medium Low Atrium Low 9
High Low Atrium Low 10
Low Medium Atrium Low 8
Medium Medium Atrium Low 8
High Medium Atrium Low 12
Low High Atrium Low 6
Medium High Atrium Low 7
High High Atrium Low 9
Low Low Terrace Low 18
Medium Low Terrace Low 6
High Low Terrace Low 7
Low Medium Terrace Low 15
Medium Medium Terrace Low 13
High Medium Terrace Low 13
Low High Terrace Low 7
Medium High Terrace Low 5
High High Terrace Low 11
Low Low Tower High 14
Medium Low Tower High 19
High Low Tower High 37
Low Medium Tower High 17
Medium Medium Tower High 23
High Medium Tower High 40
Low High Tower High 3
Medium High Tower High 5
High High Tower High 23
Low Low Apartment High 78
Medium Low Apartment High 46
High Low Apartment High 43
Low Medium Apartment High 48
Medium Medium Apartment High 45
High Medium Apartment High 86
Low High Apartment High 15
Medium High Apartment High 25
High High Apartment High 62
Low Low Atrium High 20
Medium Low Atrium High 23
High Low Atrium High 20
Low Medium Atrium High 10
Medium Medium Atrium High 22
High Medium Atrium High 24
Low High Atrium High 7
Medium High Atrium High 10
High High Atrium High 21
Low Low Terrace High 57
Medium Low Terrace High 23
High Low Terrace High 13
Low Medium Terrace High 31
Medium Medium Terrace High 21
High Medium Terrace High 13
Low High Terrace High 5
Medium High Terrace High 6
High High Terrace High 13
; run;
* twoway interactions;
proc logistic data=housing;
freq freq;
class infl (ref="Low") type cont / param=ref;
model sat (ref="Medium") = infl type cont
infl*type infl*cont type*cont /link=glogit
aggregate=(infl type cont) scale=none;
run;
* additive model;
proc logistic data=housing;
freq freq;
class infl (ref="Low") type cont / param=ref;
model sat (ref="Medium") = infl type cont /link=glogit
aggregate=(infl type cont) scale=none;
run;
* type and cont only;
proc logistic data=housing;
freq freq;
class type cont / param=ref;
model sat (ref="Medium") = type cont /link=glogit
aggregate=(infl type cont) scale=none;
run;
* infl and cont only;
proc logistic data=housing;
freq freq;
class infl (ref="Low") cont / param=ref;
model sat (ref="Medium") = infl cont /link=glogit
aggregate=(infl type cont) scale=none;
run;
* infl and type only;
proc logistic data=housing;
freq freq;
class infl (ref="Low") type / param=ref;
model sat (ref="Medium") = infl type /link=glogit
aggregate=(infl type cont) scale=none;
run;
* intercept only;
proc logistic data=housing;
freq freq;
class / param=ref;
model sat (ref="Medium") = /link=glogit
aggregate=(infl type cont) scale=none;
run;
Output
Here is the output pertaining to the "response profile" that gives us three levels of the satisfaction response and clarifying the baseline as "Medium". For example, there are 668 individuals who responded with "High" satisfaction.
Response Profile  

Ordered Value 
Sat  Total Frequency 
1  High  668 
2  Low  567 
3  Medium  446 
Here is the output pertaining to the goodness of fit.
Deviance and Pearson GoodnessofFit Statistics  

Criterion  Value  DF  Value/DF  Pr > ChiSq 
Deviance  5.9443  12  0.4954  0.9189 
Pearson  5.9670  12  0.4973  0.9177 
Number of unique profiles: 24
There are \(N = 24\) profiles (unique combinations of the predictors) in this dataset. Recall that this part of the output tests the fit of the current model versus the saturated model. The saturated model, which fits a separate multinomial distribution to each profile, has \(24\cdot2 = 48\) parameters. Compared with the current model, which has 36 parameters to include up to twoway interactions only, this gives 12 degrees of freedom for the test.
The model fits well because both the residual deviance and Pearson test statistics are small, relative to their chisquare distributions; both pvalues are large and indicate very little evidence of lack of fit.
Output pertaining to the significance of the predictors:
Joint Tests  

Effect  DF  Wald ChiSquare 
Pr > ChiSq 
Infl  4  13.2983  0.0099 
Type  6  25.3474  0.0003 
Cont  2  5.7270  0.0571 
Infl*Type  12  20.8975  0.0519 
Infl*Cont  4  0.8772  0.9278 
Type*Cont  6  9.4081  0.1519 
Under fullrank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization.
The first section (global null hypothesis) tests the fit of the current model against the null or interceptonly model. The null model has two parameters (oneintercept for each nonbaseline equation). Therefore, the comparison has \(36  2 = 34\) degrees of freedom. This test is highly significant, indicating that at least one of the predictors has an effect on satisfaction.
The next section (Type III analysis of effects) shows the change in fit resulting from discarding any one of the predictors—influence, type, or contact—while keeping the others in the model. For example, the test for influence is equivalent to setting its two indicator coefficients equal to zero in each of the two logit equations; so the test for significance of influence has \(2\cdot2=4\) degrees of freedom. Judging from these tests, we see that
 Perceived influence on management depends on the type of housing to a degree (interaction), but it's borderline insignificant, given that other terms are already in the model. Other interaction terms are relatively weak.
 Both influence and housing type are strongly related to satisfaction, while contact is borderline insignificant.
This suggests that we may get a more efficient model by removing one or more interaction terms. Below are the goodness of fit statistics for the additive model (no interaction terms). This seems to fit reasonably well and has considerably fewer parameters: \((31)(1+6)=14\). We'll retain this additive model and interpret the parameter estimates next.
Interpreting Parameter Estimates
Let's look at the estimated coefficients of the current model that contains only the main effects:
Analysis of Maximum Likelihood Estimates  

Parameter  Sat  DF  Estimate  Standard Error 
Wald ChiSquare 
Pr > ChiSq  
Intercept  High  1  0.2805  0.1662  2.8473  0.0915  
Intercept  Low  1  0.4192  0.1729  5.8768  0.0153  
Infl  High  High  1  0.9477  0.1681  31.8016  <.0001 
Infl  High  Low  1  0.6649  0.1863  12.7338  0.0004 
Infl  Medium  High  1  0.2885  0.1448  3.9704  0.0463 
Infl  Medium  Low  1  0.4464  0.1416  9.9443  0.0016 
Type  Apartmen  High  1  0.2999  0.1563  3.6834  0.0550 
Type  Apartmen  Low  1  0.4357  0.1725  6.3769  0.0116 
Type  Atrium  High  1  0.5393  0.1996  7.3033  0.0069 
Type  Atrium  Low  1  0.1314  0.2231  0.3467  0.5560 
Type  Terrace  High  1  0.7458  0.2105  12.5494  0.0004 
Type  Terrace  Low  1  0.6666  0.2063  10.4445  0.0012 
Cont  High  High  1  0.1210  0.1293  0.8752  0.3495 
Cont  High  Low  1  0.3609  0.1324  7.4284  0.0064 
The LOGISTIC Procedure
Model Information  

Data Set  WORK.HOUSING 
Response Variable  Sat 
Number of Response Levels  3 
Frequency Variable  Freq 
Model  generalized logit 
Optimization Technique  NewtonRaphson 
Odds Ratio Estimates  

Effect  Sat  Point Estimate  95% Wald Confidence Limits 

Infl High vs Low  High  2.580  1.856  3.586 
Infl High vs Low  Low  0.514  0.357  0.741 
Infl Medium vs Low  High  1.334  1.005  1.772 
Infl Medium vs Low  Low  0.640  0.485  0.845 
Type Apartmen vs Tower  High  0.741  0.545  1.006 
Type Apartmen vs Tower  Low  1.546  1.102  2.168 
Type Atrium vs Tower  High  0.583  0.394  0.862 
Type Atrium vs Tower  Low  0.877  0.566  1.358 
Type Terrace vs Tower  High  0.474  0.314  0.717 
Type Terrace vs Tower  Low  1.948  1.300  2.918 
Cont High vs Low  High  1.129  0.876  1.454 
Cont High vs Low  Low  0.697  0.538  0.904 
How do we interpret them? Recall that there are two logit equations involved: 1) the logodds of low satisfaction versus medium and 2) the logodds of high satisfaction versus medium. This baseline reference is in addition to any baselines involved in the predictors. For the two indicators used for perceived influence, the baseline corresponds to "low". So, from the output, the estimated coefficient for high influence is interpreted as follows: for those with high perceived influence on management, the odds of high satisfaction (versus medium) is estimated to be \(e^{.9477}=2.58\) times as high as that for those with low influence. As with other glms we've dealt with to this point, we may change the baselines arbitrarily, which changes the interpretations of the numerical values reported but not the significance of the predictor (all levels together) or the fit of the model as a whole.
The intercepts give the estimated logodds for the baseline groups. For example, the estimated logodds of high satisfaction (versus medium) is \(e^{.2805}=1.32\) for those with low perceived influence, housing type tower, and low contact with other residents.
Although the above output shows all three predictor main effects are significantly related to satisfaction, we may consider the same model selection techniques here as we used earlier for binary logistic regression.
Model Selection
In the main effects model, the one we fitted above, the Wald statistics reported indicate that each of the three predictors is highly significant, given the others. We can also address this by comparing deviances, full model versus reduced model; we just have to be careful of the number of parameters since each effect we test applies to two (in general, \(r1\)) logit models. The current (additive) model has deviance \(G^2=38.6622\) with 34 degrees of freedom versus the saturated model as we noted above. If we remove the influence indicators, the deviance increases to \(G^2=147.7797\) with 38 degrees of freedom, and the LRT statistic for comparing these two models would be
\(147.7797  38.6622=109.1175\)
which can be compared against a chisquare distribution with \(3834=4\) degrees of freedom (pvalue approximately 0). So, the effect of influence is considered (highly) significant. Similarly, we can compare any two models in this way, provided one is a special case of the other. Repeating the modelfitting for various sets of predictors, we obtain the following analysisofdeviance table:
Model  \(G^2\)  df 

Saturated  0.00  0 
Infl + Type + Cont  38.6622  34 
Type + Cont  147.7797  38 
Infl + Cont  100.8891  40 
Infl + Type  54.7219  36 
...  ...  ... 
Null  217.4560  46 
We can refine the model by removing terms if doing so does not significantly reduce the fit.
To compare models that are not of the form full versus reduced, we can use an information criterion, such as AIC. For example, to compare the models with only Type and Cont against the model with only Infl and Cont, the AIC values would be 3599.201 and 3548.311, respectively, in favor of the latter (lower AIC).
Finally, an adjustment for overdispersion could also be implemented as it was with binary logistic regression. The assumption is that the multinomial variance assumed in the model is off by a scale parameter, which we can estimate from the Pearson goodness of fit test. For the additive model, this would be
\(\dfrac{X^2}{df}=\dfrac{38.9104}{34}=1.144\)
Since this is so close to 1, we would not opt to include such an adjustment in this case, but if we did, the effect would be to multiply all variance and standard error estimates by \(1.144\) and \(\sqrt{1.144}\), respectively.