T.3.2 - Polytomous Regression

Note! that the material in this section is more technical than the preceding Lessons. It is offered as an introduction to more advanced topics and, given the technical nature of the material, it could be considered optional in the context of this course.

In binary logistic regression, we only had two possible outcomes. For polytomous logistic regression, we will consider the possibility of having k > 2 possible outcomes. (Note: The word polychotomous is sometimes used, but note that this is not actually a word!)

Nominal Logistic Regression

The multiple nominal logistic regression model (sometimes called the multinomial logistic regression model) is given by the following:

\(\begin{equation}\label{nommod}
\pi_{j}=\left\{
\begin{array}{ll}
\dfrac{\exp(\textbf{X}\beta_{j})}{1+\sum_{j=2}^{k}\exp(\textbf{X}\beta_{j})} & \hbox{j=2,...,k} \\
\\
\dfrac{1}{1+\sum_{j=2}^{k}\exp(\textbf{X}\beta_{j})}
& \hbox{j=1} \end{array} \right.
\end{equation}\)

where again \(\pi_{j}\) denotes a probability and not the irrational number. Notice that k - 1 of the groups has its own set of \(\beta\) values. Furthermore, since \(\sum_{j=1}^{k}\pi_{j}=1\), we set the \(\beta\) values for group 1 to be 0 (this is what we call the reference group). Notice that when k = 2, we are back to binary logistic regression.

\(\pi_{j}\) is the probability that an observation is in one of the k categories. The likelihood for the nominal logistic regression model is given by:

\(\begin{align*}
L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\prod_{j=1}^{k}\pi_{i,j}^{y_{i,j}}(1-\pi_{i,j})^{1-y_{i,j}},
\end{align*}\)

where the subscript \((i,j)\) means the \(i^{\textrm{th}}\) observation belongs to the \(j^{\textrm{th}}\) group. This yields the log-likelihood:

\(\begin{equation*}
\ell(\beta)=\sum_{i=1}^{n}\sum_{j=1}^{k}y_{i,j}\pi_{i,j}.
\end{equation*}\)

Maximizing the likelihood (or log-likelihood) has no closed-form solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, \(\hat{\beta}\).

An odds ratio (\(\theta\)) of 1 serves as the baseline for comparison. If \(\theta=1\), then there is no association between the response and predictor. If \(\theta>1 \), then the odds of success are higher for the indicated level of the factor (or for higher levels of a continuous predictor). If \(\theta<1 \), then the odds of success are less for the indicated level of the factor (or for higher levels of a continuous predictor). Values farther from 1 represent stronger degrees of association. For nominal logistic regression, the odds of success (at two different levels of the predictors, say \(\textbf{X}_{(1)}\) and \(\textbf{X}_{(2)}\)) are:

\(\begin{equation*}
\theta=\dfrac{(\pi_{j}/\pi_{1})|_{\textbf{X}=\textbf{X}_{(1)}}}{(\pi_{j}/\pi_{1})|_{\textbf{X}=\textbf{X}_{(2)}}}.
\end{equation*}\)

Many of the procedures discussed in binary logistic regression can be extended to nominal logistic regression with the appropriate modifications.

Ordinal Logistic Regression

For ordinal logistic regression, we again consider k possible outcomes as in nominal logistic regression, except that the order matters. The multiple ordinal logistic regression model is the following:

\(\begin{equation}\label{ordmod}
\sum_{j=1}^{k^{*}}\pi_{j}=\dfrac{\exp(\beta_{0,k^{*}}+\textbf{X}\beta)}{1+\exp(\beta_{0,k^{*}}+\textbf{X}\beta)}
\end{equation}\)

such that \(k^{*}\leq k\), \(\pi_{1}\leq\pi_{2},\leq \ldots,\leq\pi_{k}\), and again \(\pi_{j}\) denotes a probability. Notice that this model is a cumulative sum of probabilities which involves just changing the intercept of the linear regression portion (so \(\beta\) is now (p - 1)-dimensional and X is \(n\times(p-1)\) such that first column of this matrix is not a column of 1's). Also, it still holds that \(\sum_{j=1}^{k}\pi_{j}=1\).

\(\pi_{j}\) is still the probability that an observation is in one of the k categories, but we are constrained by the model written in the equation above. The likelihood for the ordinal logistic regression model is given by:

\(\begin{align*}
L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\prod_{j=1}^{k}\pi_{i,j}^{y_{i,j}}(1-\pi_{i,j})^{1-y_{i,j}},
\end{align*}\)

where the subscript (i, j) means the \(i^{\textrm{th}}\) observation belongs to the \(j^{\textrm{th}}\) group. This yields the log-likelihood:

\(\begin{equation*}
\ell(\beta)=\sum_{i=1}^{n}\sum_{j=1}^{k}y_{i,j}\pi_{i,j}.
\end{equation*}\)

Notice that this is identical to the nominal logistic regression likelihood. Thus, maximization again has no closed-form solution, so we defer to a procedure like iteratively reweighted least squares.

For ordinal logistic regression, a proportional odds model is used to determine the odds ratio. Again, an odds ratio (\(\theta\)) of 1 serves as the baseline for comparison between the two predictor levels, say \(\textbf{X}_{(1)}\) and \(\textbf{X}_{(2)}\). Only one parameter and one odds ratio is calculated for each predictor. Suppose we are interested in calculating the odds of \(\textbf{X}_{(1)}\) to \(\textbf{X}_{(2)}\). If \(\theta=1\), then there is no association between the response and these two predictors. If \(\theta>1\), then the odds of success are higher for the predictor \(\textbf{X}_{(1)}\). If \(\theta<1\), then the odds of success are less for the predictor \(\textbf{X}_{(1)}\). Values farther from 1 represent stronger degrees of association. For ordinal logistic regression, the odds ratio utilizes cumulative probabilities and their complements and is given by:

\(\begin{equation*}
\theta=\dfrac{\sum_{j=1}^{k^{*}}\pi_{j}|_{\textbf{X}=\textbf{X}_{(1)}}/(1-\sum_{j=1}^{k^{*}}\pi_{j})|_{\textbf{X}=\textbf{X}_{(1)}}}{\sum_{j=1}^{k^{*}}\pi_{j}|_{\textbf{X}=\textbf{X}_{(2)}}/(1\sum_{j=1}^{k^{*}}\pi_{j})|_{\textbf{X}=\textbf{X}_{(2)}}}.
\end{equation*}\)