6: Binary Logistic Regression

Overview Section

Thus far, our focus has been on describing interactions or associations between two or three categorical variables mostly via single summary statistics and with significance testing. From this lesson on, we will focus on modeling. Models can handle more complicated situations and analyze the simultaneous effects of multiple variables, including combinations of categorical and continuous variables. In the next two lessons, we study binomial logistic regression, a special case of a generalized linear model.

Logistic regression is applicable, for example, if we want to...

  • model the probabilities of a response variable as a function of some explanatory variables, e.g., "success" of admission as a function of sex.
  • perform descriptive discriminate analyses such as describing the differences between individuals in separate groups as a function of explanatory variables, e.g., student admitted and rejected as a function of sex.
  • classify individuals into two categories based on explanatory variables, e.g., classify new students into "admitted" or "rejected" groups depending on sex.

As we'll see, there are two key differences between binomial (or binary) logistic regression and classical linear regression. One is that instead of a normal distribution, the logistic regression response has a binomial distribution (can be either "success" or "failure"), and the other is that instead of relating the response directly to a set of predictors, the logistic model uses the log-odds of success---a transformation of the success probability called the logit. Among other benefits, working with the log-odds prevents any probability estimates to fall outside the range (0, 1).

We begin with two-way tables, then progress to three-way tables, where all explanatory variables are categorical. Then, continuing into the next lesson, we introduce binary logistic regression with continuous predictors as well. In the last part, we will focus on more model diagnostics and model selection.

Upon completion of this lesson, you should be able to:

  Objective 6.1

Explain the assumptions of the logistic regression model and interpret the parameters involved.

  Objective 6.2

Use a logistic regression model to explain joint and conditional relationships among three or more variables.

  Objective 6.3

Use software to fit a logistic regression model to sample data.

  Objective 6.4

Interpret interaction of multiple predictors in a logistic regression model.