6.4 - Summary Points for Logistic Regression

Printer-friendly versionPrinter-friendly version

Logit models represent how binary (or multinomial) response variable is related to a set of explanatory variables, which can be discrete and/or continuous. In this lesson we focused on Binary Logistic Regression. Below is a brief summary and link to Log-Linear and Probit models.

Summary Points for Logistic Regression

  • Cases are independent
  • Does NOT assume a linear relationship between the dependent variable and the independent variables, but it does assume linear relationship between the logit of the explanatory variables and the response.
  • Independent variables can be even the power terms or some other nonlinear transformations of the original independent variables
  • The dependent variable does NOT need to be normally distributed, but it typically assumes a distribution from an exponential family (e.g. binomial, Poisson, multinomial, normal,...); binary logistic regression assume binomial distribution of the response
  • The homogeneity of variance does NOT need to be satisfied
  • Errors need to be independent but NOT normally distributed
  • It uses maximum likelihood estimation (MLE) rather than ordinary least squares (OLS) to estimate the parameters, and thus relies on large-sample approximations
  • Goodness-of-fit measures rely on sufficiently large samples, where a heuristic rule is that not more than 20% of the cells counts are less than 5.
  • When there are continuous predictors, the G2 and X2 are not the best statistics for assessing the overall fit of the model. Usually some grouping of the data is needed. The most commonly, use the Hosmer-Lemeshow statistic, and influence values and plots.
  • As with any other model you can take into consideration sample size and power. For more details see Agresti (2007), Section 5.5, or Agresti (2013), Section 6.5.
  • "Exact" inference methods also exist for logistic regression. For more details see Agresti(2007), Section 5.4, or Agresti (2013), Section 6.7., and SAS or other software of your choice for details.

For a more detailed discussion refer to Agresti (2007), Ch.3, Agresti (2013), Ch.4, (pages 115-118, 135-132), and/or McCullagh & Nelder (1989).

They are related in a sense that the loglinear models are more general than logit models, and some logit models are equivalent to certain loglinear models (e.g. consider the admissions data example or boys scout example).

  • if you have a binary response variable in the loglinear model, you can construct the logits to help with the interpretation of the loglinear model.
  • some logit models with only categorical variables have equivalent loglinear models

The Link between Logit and Probit Models

Both model how binary response variable depends on a set of explanatory variable They have the same:

  • Random component: Y is Binomial
  • Systematic component: linear function of explanatory variables

But they differ in the link function.

The logistic regression model

\(\text{logit}(\pi(x))=\text{log} \left(\dfrac{\pi(x)}{1-\pi(x)}\right)=\beta_0+\beta x\)

uses the logistic cumulative distribution function (cdf).

The probit model

\(\text{probit}(\pi(x))=\beta_0+\beta x\)

uses normal cdf

\(\text{probit}(\pi)=F^{-1}(X \leq x)\)

that is the inverse of the standard normal distribution. For example, probit(0.975) = 1.96, probit(0.950) = 1.64, and probit(0.5) = 0.

Fitted values between these two models are often very similar. Rarely does one of these models fit substantially better (or worse) than the other, although more difference can be observed with sparse data.

Why does this work?

Think back to intro statistics classes and approximating binomial distribution with normal.

More on probit models see Agresti (2007), Section 3.2.4, or Agresti (2013), Section 6.6.

Some additional references:

  • Collett, D (1991). Analysis of Binary Data.
  • Fey, M. (2002). Measuring a binary response's range of influence in logistic regression. American Statistician, 56, 5-9.
  • Hosmer, D.W. & Lemeshow, S. (1989). Applied Logistic Regression.
  • Fienberg, S.E. The Analysis of Cross-Classified Categorical Data. 2nd ed. Cambridge, MA
  • McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models. 2nd Ed.
  • Pregibon, D. (1981) Logistic Regression Diagnostics. Annals of Statistics, 9, 705-724.
  • Rice, J. C. (1994). "Logistic regression: An introduction". In B. Thompson, ed., Advances in social science methodology, Vol. 3: 191-245. Greenwich, CT: JAI Press. Popular introduction.
  • SAS Institute (1995). Logistic Regression Examples Using the SAS System, Version 6.
  • Strauss, David (1999). The Many faces of logistic regression. American Statistician.