6.2 - Single Categorical Predictor

Overview of Binary Logistic Regression Section

Binary logistic regression models the probability that a characteristic is present (i.e., "success"), given the values of explanatory variables \(x_1,\ldots,x_k\). We denote this by \(\pi(x_1,\ldots,x_k) = P(\mbox{success}|x_1,\ldots,x_k)\) or simply by \(\pi\) for convenience---but it should be understood that \(\pi\) will in general depend on one or more explanatory variables. For example, a physician may be interested in estimating the proportion of diabetic persons in a population. Naturally, she knows that all sections of the population do not have an equal probability of "success", i.e., being diabetic. Certain conditions, such as advanced age and hypertension, would likely contribute to a higher proportion.

Assumptions for logistic regression:

  • The response variable \(Y\) is a binomial random variable with a single trial and success probability \(\pi\). Thus, \(Y=1\) corresponds to "success" and occurs with probability \(\pi\), and \(Y=0\) corresponds to "failure" and occurs with probability \(1-\pi\).
  • The set of predictor or explanatory variables \(x = (x_1, x_2, \ldots, x_k)\) are fixed (not random) and can be discrete, continuous, or a combination of both. As with classical regression, two or more of these may be indicator variables to model the nominal categories of a single predictor, and others may represent interactions between two or more explanatory variables.
  • Together, the data is collected for the \(i\)th individual in the vector \((x_{1i},\ldots,x_{ki},Y_i)\), for \(i=1,\ldots n\). These are assumed independent by the sampling mechanism. This also allows us to combine or group the data, which we do below, by summing over trials for which \(\pi\) is constant. In this section of the notes, we focus on a single explanatory variable \(x\).

The model is expressed as

\(\log\left(\dfrac{\pi_i}{1-\pi_i}\right)= \beta_0+\beta_1 x_i\)

Or, by solving for \(\pi_i\), we have the equivalent expression

\(\pi_i=\dfrac{\exp(\beta_0+\beta_1 x_i)}{1+\exp(\beta_0+\beta_1 x_i)}\)

To estimate the parameters, we substitute this expression for \(\pi_i\) into the joint pdf for \(Y_1,\ldots,Y_n\)


to give us the likelihood function \(L(\beta_0,\beta_1)\) of the regression parameters. By maximizing this likelihood over all possible \(\beta_0\) and \(\beta_1\), we have the maximum likelihood estimates (MLEs): \(\hat{\beta}_0\) and \(\hat{\beta}_1\). Extending this to include additional explanatory variables is straightforward.

Working with Grouped Data Section

It's worth noting that when the same value of the predictor \(x\) occurs more than once, the success probability is constant for all \(Y\) responses associated with that \(x\) value, and we can work with the sum of such \(Y\)s as a binomial with the number of trials equal to the number of times that \(x\) value occurred. Our example with parent and student smoking illustrates this. Recall the data for the two-way table:

  Student smokes Student does not smoke
1–2 parents smoke 816 3203
Neither parent smokes 188 1168

We have two approaches to work with. The first is for the individual or ungrouped data: \(x_i=1\) if the \(i\)th student's parents smoke (0 otherwise), \(Y_i=1\) if the \(i\)th student smokes, and \(Y_i\sim\mbox{Binomial}(1,\pi_i)\), for \(i=1,\ldots,5375\). The second is for the grouped data: \(x_i=1\) if the \(i\)th table row corresponds to parents smoking (0 otherwise), \(Y_i=\) number of students who smoke for the \(i\)th row, and \(Y_i\sim\mbox{Binomial}(n_i,\pi_i)\), for \(i=1,2\) and \((n_1,n_2)=(4019,1356)\).

With either approach, \(\pi\) is the probability that a student smokes, and its estimate will be the same as well. Even though the index has a much greater range in the ungrouped approach, there are still only two unique values for \(\pi\): that for the group whose parents smoke and that for the group whose parents don't. The likelihood function is simplified for the grouped data, however, and becomes


The range for \(i\) in the grouped approach is denoted by \(N\) to distinguish it from the individual total \(n\) (so \(N=2\) in this example). Whenever possible, we will generally prefer to work with the grouped data because, in addition to having a more compact likelihood function, the larger binomial counts can be approximated by a normal distribution and will lend themselves to more meaningful residual diagnostics and large-sample significance tests.

Grouping is not always possible, however. Unless two responses have the same predictor value (or combination of predictor values if two or more predictors are present), their success probabilities would not be the same, and they cannot be added together to give a binomial variable. This is a common issue when lots of predictors and/or interactions are included in the model or with continuous predictor types because their values tend to be unique.

Interpretation of Parameter Estimates Section

One source of complication when interpreting parameters in the logistic regression model is that they're on the logit or log-odds scale. We need to be careful to convert them back before interpreting the terms of the original variables.

  • \(\exp(\beta_0) =\) the odds that the success characteristic is present for an individual for which \(x = 0\), i.e., at the baseline. If multiple predictors are involved, all would need to be set to 0 for this interpretation.
  • \(\exp(\beta_1) =\) the multiplicative increase in the odds of success for every 1-unit increase in \(x\). This is similar to simple linear regression but instead of an additive change, it is a multiplicative change in rate. If multiple predictors are involved, others would need to be held fixed for this interpretation.
  • If \(\beta_1 > 0\), then \(\exp(\beta_j) > 1\), indicating a positive relationship between \(x\) and the probability and odds of the success event. If \(\beta_j < 0\), then the opposite holds.