8.1  Polytomous (Multinomial) Logistic Regression
We have already learned about binary logistic regression, where the response is a binary variable with 'success' and 'failure' being only two categories. But logistic regression can be extended to handle responses, Y, that are polytomous, i.e. taking r > 2 categories. (Note: The word polychotomous is sometimes used, but this word does not exist!).
When r = 2, Y is dichotomous and we can model log of odds that an event occurs or does not occur. For binary logistic regression there is only 1 logit that we can form.
\(\text{logit}(\pi)=\text{log}\left(\dfrac{\pi}{1\pi}\right)\)
When r > 2, we have a multicategory or polytomous response variable. There are r (r − 1)/2 logits (odds) that we can form, but only (r − 1) are nonredundant. There are different ways to form a set of (r − 1) nonredundant logits, and these will lead to different polytomous (multinomial) logistic regression models.
Multinomial Logistic Regression models how multinomial response variable Y depends on a set of k explanatory variables, X=(X_{1}, X_{2}, ... X_{k}). This is also a GLM where the random component assumes that the distribution of Y is Multinomial(n,$\mathbf{π}$), where $\mathbf{π}$ is a vector with probabilities of "success" for each category. The systematic component are explanatory variables (can be continuous, discrete, or both) and are linear in the parameters, e.g., β_{0} + βx_{i} + ... + β_{0} + βx_{k}. Again, transformation of the X's themselves are allowed like in linear regression. The link function is the generalized Logit, the logit link for each pair of nonredundant logits as discussed above.
When analyzing a polytomous response, it's important to note whether the response is ordinal (consisting of ordered categories) or nominal (consisting of unordered categories). For the binary logistic model, this question does not arise.
Some types of models are appropriate only for ordinal responses; e.g., cumulative logits model, adjacent categories model, continuation ratios model. Other models may be used whether the response is ordinal or nominal; e.g., baseline logit model, and conditional logit model.
If the response is ordinal, we do not necessarily have to take the ordering into account, but only very rarely this information is ignored. Ordinality in the response is vital information; neglecting it almost always will lead to suboptimal models. Using the natural ordering can
 lead to a simpler, more parsimonious model and
 increase power to detect relationships with other variables.
If the response variable is polytomous and all the potential predictors are discrete as well, we could describe the multiway contingency table by a loglinear model. However, if you are analyzing a set of categorical variables, and one of them is clearly a "response" while the others are predictors, I recommend that you use logistic rather than loglinear models. Fitting a loglinear model in this setting could have two disadvantages:
 It has many more parameters, and many of them are not of interest. The loglinear model, as we will learn later, describes the joint distribution of all the variables, whereas the logistic model describes only the conditional distribution of the response given the predictors.
 The loglinear model is often more complicated to interpret. In the loglinear model, the effect of a predictor X on the response Y is described by the XY association. In a logit model, however, the effect of X on Y is a main effect.
Grouped versus ungrouped response & the sampling model
We have already pointed out in lessons on logistic regression, data can come in ungrouped (e.g., database form) or grouped format (e.g., tabular form).
Consider a study that investigates the cheese preference for four types of cheeses; for the detailed analysis see the Cheese Tasting example. The response variable Y is a Likert Scale response with nine categories:
Y = 1 for strong dislike ,
Y = 2 dislike,
.
.
.
Y = 9 for excellent taste.
The main predictor of interest is type of cheese (A, B, C and D). The data could arrive in ungrouped form, with one record per subject (as below) where the first column indicates the type of cheese and the second column the value of Y:
A 1
A 3
B 4
C 1
D 9
A 3
B 2
D 7
D 1
.
.
D 9
.
.
Or it could arrive in grouped form (e.g., table):
Cheese

Response category


1

2

3

4

5

6

7

8

9


A 
0

0

1

7

8

8

19

8

1

B 
6

9

12

11

7

6

1

0

0

C 
1

1

6

8

23

7

5

1

0

D 
0

0

0

1

3

7

14

16

11

Sampling Model
In ungrouped form, the response occupies a single column of the dataset, but in grouped form the response occupies r columns. Most computer programs for polytomous logistic regression can handle grouped or ungrouped data.
Whether the data are grouped or ungrouped, we will imagine the response to be multinomial. That is, the "response" for row i,
\(y_i=(y_{i1},y_{i2},\ldots,y_{ir})^T \),
is assumed to have a multinomial distribution with index \(n_i=\sum_{j=1}^r y_{ij}\) and parameter
\(\pi_i=(\pi_{i1},\pi_{i2},\ldots,\pi_{ir})^T \).
For example, for the first row, cheese A, π_{1} = (π_{1}_{1}, π_{1}_{2}, . . . , π_{19})^{T}.
 If the data are grouped, then n_{i} is the total number of "trials" in the i^{th} row of the dataset, and y_{ij} is the number of trials in which outcome j occurred. For example, for the first row, cheese A, n_{1} = 52, and there are 0 people who have a strong dislike for this cheese, y_{11} = 0 or just dislike the cheese A, y_{12} = 0.
 If the data are ungrouped, y_{i} = 1 implies that the outcome has occurred and y_{i} = 0 if it has not, while n_{i} = 1. Note, however, that if the data are ungrouped, we do not have to actually create a dataset with columns of 0's and 1's; a single column containing the response level 1, 2, . . . , r is sufficient like above for the cheese example.
Describing polytomous responses by a sequence of binary models.
In some cases, it makes sense to "factor" the response into a sequence of binary choices and model them with a sequence of ordinary logistic models. The number of binary logistic regressions needed is equal to the number of categories of the response minus 1, e.g., r1.
For example, consider a medical study to investigate the longterm effects of radiation exposure on mortality. The response variable has four levels (Y=1 if alive, Y=2 if dead from cause other than cancer, Y=3 if dead from cancer other than leukemia, and Y=4 if dead from leukemia). The main predictor of interest is level of exposure (low, medium, high). The fourlevel response can be modeled via a single multinomial model, or as a sequence of binary choices in three stages:
 The stage 1 model, which is fit to all subjects, describes the logodds of death.
 The stage 2 model, which is fit only to the subjects that die, describes the logodds of death due to cancer versus death from other causes.
 The stage 3 model, which is fit only to the subjects who die of cancer, describes the logodds of death due to leukemia versus death due to other cancers.
Because the multinomial distribution can be factored into a sequence of conditional binomials, we can fit these three logistic models separately. The overall likelihood function factors into three independent likelihoods.
This approach is attractive when the response can be naturally arranged as a sequence of binary choices. But in situations where arranging such a sequence is unnatural, we should probably fit a single multinomial model to the entire response. The cheese example above would not be a good example for the binary sequence approach since we are dealing with four very different cheese types.