8.1 - Polytomous (Multinomial) Logistic Regression

We have already learned about binary logistic regression, where the response is a binary variable with "success" and "failure" being only two categories. But logistic regression can be extended to handle responses, \(Y\), that are polytomous, i.e. taking \(r > 2\) categories. When \(r = 2\), \(Y\) is dichotomous, and we can model the log odds that an event occurs (versus not). For binary logistic regression, there is only one logit that we can form:

\(\text{logit}(\pi)=\log\left(\dfrac{\pi}{1-\pi}\right)\)

When \(r > 2\), we have a multi-category or polytomous response variable. There are \(\dfrac{r (r − 1)}{2}\) logits (odds) that we can form, but only \((r − 1)\) are non-redundant. There are different ways to form a set of \((r − 1)\) non-redundant logits, and these will lead to different polytomous (multinomial) logistic regression models.

Multinomial Logistic Regression models how a multinomial response variable \(Y\) depends on a set of \(k\) explanatory variables, \(x=(x_1, x_2, \dots, x_k)\). This is also a GLM where the random component assumes that the distribution of \(Y\) is multinomial(\(n,\pi\)), where \(\pi\) is a vector with probabilities of "success" for the categories. As with binary logistic regression, the systematic component consists of explanatory variables (can be continuous, discrete, or both) and are linear in the parameters. The link function is the generalized logit, the logit link for each pair of non-redundant logits as discussed above.

When analyzing a polytomous response, it's important to note whether the response is ordinal (consisting of ordered categories) or nominal (consisting of unordered categories). For the binary logistic model, this question does not arise. Some types of models are appropriate only for ordinal responses (e.g., cumulative logits model, adjacent categories model). Other models may be used whether the response is ordinal or nominal (e.g., baseline logit model).

If the response is ordinal, we do not necessarily have to take the ordering into account, but neglecting to do so may lead to sub-optimal models. Using the natural ordering can

lead to a simpler, more parsimonious model and
increase power to detect relationships with other variables.

If the response variable is polytomous and all the potential predictors are discrete as well, we could describe the multi-way contingency table with a log-linear model (see Lesson 10), but this approach views all variables on equal terms without a dedicated response. If one is to be treated as a response and others as explanatory, the (multinomial) logistic regression model is more appropriate.

Grouped versus ungrouped responses

We have already seen in our discussions of logistic regression, data can come in ungrouped (e.g., database form) or grouped format (e.g., tabular form). Consider a study that explores the effect of fat content on taste rating of ice cream. The response variable \(Y\) is a multi-category (Likert scale) response, ranging from 1 (lowest rating) to 9 (highest rating). The data could arrive in ungrouped form, with one record per subject (as below) where the first column indicates the fat content, and the second column the rating:

Or it could arrive in grouped form (e.g., table):

Fat	Rating category
Fat	1	2	3	4	5	6	7	8	9
0	4	17	8	16	5	6	4	2	1
4	1	1	5	6	7	9	21	12	0
...	...	...	...	...	...	...	...	...	...
28	4	6	9	11	5	9	7	8	3

Sampling Model

In ungrouped form, the response occupies a single column of the dataset, but in grouped form, the response occupies \(r\) columns. Most computer programs for polytomous logistic regression can handle grouped or ungrouped data.

Whether the data are grouped or ungrouped, we will imagine the response to be multinomial. That is, the "response" for row \(i\),

\(y_i=(y_{i1},y_{i2},\ldots,y_{ir})^T \),

is assumed to have a multinomial distribution with index \(n_i=\sum_{j=1}^r y_{ij}\) and parameter

\(\pi_i=(\pi_{i1},\pi_{i2},\ldots,\pi_{ir})^T \).

For example, for the first row, with fat=0, \(\pi_1 = (\pi_{11}, \pi_{12}, \dots , \pi_{19})^T\).

If the data are grouped, then \(n_i\) is the total number of "trials" in the \(i^{th}\) row of the dataset, and \(y_{ij}\) is the number of trials in which outcome \(j\) occurred. For example, for the first row, there were \(n_1 = 63\) people who tasted ice cream with fat=0, and \(y_{12}=17\) among them gave the rating of 2.
If the data are ungrouped, \(y_i = j\) implies that individual observation (subject, etc.) \(i\) produced outcome \(j\).

Describing polytomous responses by a sequence of binary models

In some cases, it makes sense to "factor" the response into a sequence of binary choices and model them with a sequence of ordinary logistic models. The number of binary logistic regressions needed is equal to the number of categories of the response minus 1, e.g., \(r-1\).

For example, consider a medical study to investigate the long-term effects of radiation exposure on mortality. The response variable has four levels (\(Y=1\) if alive, \(Y=2\) if death from cause other than cancer, \(Y=3\) if death from cancer other than leukemia, and \(Y=4\) if death from leukemia). The main predictor of interest is level of exposure (low, medium, high). The four-level response can be modeled via a single multinomial model, or as a sequence of binary choices in three stages:

The stage 1 model, which is fit for all subjects, describes the log-odds of death.
The stage 2 model, which is fit only to the subjects that die, describes the log-odds of death due to cancer versus death from other causes.
The stage 3 model, which is fit only to the subjects who die of cancer, describes the log-odds of death due to leukemia versus death due to other cancers.

Because the multinomial distribution can be factored into a sequence of conditional binomials, we can fit these three logistic models separately. The overall likelihood function factors into three independent likelihoods.

This approach is attractive when the response can be naturally arranged as a sequence of binary choices. But in situations where arranging such a sequence is unnatural, we should probably fit a single multinomial model to the entire response. The ice cream example above would not be a good example for the binary sequence approach since the taste ratings do not have such a hierarchy.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility

Fat	Rating category
Fat	1	2	3	4	5	6	7	8	9
0	4	17	8	16	5	6	4	2	1
4	1	1	5	6	7	9	21	12	0
...	...	...	...	...	...	...	...	...	...
28	4	6	9	11	5	9	7	8	3

Fat	Rating category
Fat	1	2	3	4	5	6	7	8	9
0	4	17	8	16	5	6	4	2	1
4	1	1	5	6	7	9	21	12	0
...	...	...	...	...	...	...	...	...	...
28	4	6	9	11	5	9	7	8	3

Fat	Rating category
Fat	1	2	3	4	5	6	7	8	9
0	4	17	8	16	5	6	4	2	1
4	1	1	5	6	7	9	21	12	0
...	...	...	...	...	...	...	...	...	...
28	4	6	9	11	5	9	7	8	3