Beyond Logistic Regression: Generalized Linear Models (GLM)

We saw this material at the end of the Lesson 6. But a Latin proverb says: "Repetition is the mother of study" (Repetitio est mater studiorum). Let's look at the basic structure of GLMs again, before studying a specific example of Poisson Regression.

The logistic regression model is an example of a broad class of models known as generalized linear models (GLM). For example, GLMs also include linear regression, ANOVA, poisson regression, etc.

There are three components to a GLM:

  • Random Component – refers to the probability distribution of the response variable (Y); e.g. binomial distribution for Y in the binary logistic regression.
  • Systematic Component - refers to the explanatory variables (X1, X2, ... Xk) as a combination of linear predictors; e.g. β0 + β1x1 + β2x2 as we have seen in logistic regression.
  • Link Function, η or g(μ) - specifies the link between random and systematic components. It says how the expected value of the response relates to the linear predictor of explanatory variables; e.g. η = logit(π) for logistic regression.

For a more detailed discussion refer to Agresti(2007), Ch. 3, Agresti (2002), Ch.4, (pages 115-118, 135-132), Agresti (1996), Ch.4, and/or McCullagh & Nelder (1989).

Simple Linear Regression

Models how mean expected value of a continuous response variable depends on a set of explanatory variables.

Yi = β0 + βxi + εi

or

E(Yi) = β0 + βxi

  • Random component: Y is a response variable and has a normal distribution, and generally we assume ei ~ N(0, σ2).
  • Systematic component: X is the explanatory variable (can be continuous, discrete, or both) and are linear in the parameters β0 + βxi
  • Link function: Identity Link η = g(E(Yi)) = E(Yi)

Binary Logistic Regression

Models how binary response variable depends on a set of explanatory variable

  • Random component: The distribution of Y is Binomial
  • Systematic component: Xs are explanatory variables (can be continuous, discrete, or both) and are linear in the parameters β0 + βxi + ... + β0 + βxk
  • Link function: Logit
  • formula

Loglinear Models

Model the expected cell counts as a function of levels of categorical variables

  • Random component: The distribution of counts is Poisson
  • Systematic component: Xs are discrete variables used in cross-classification, and are linear in the parameters formula
  • Link Function: Log η = log(μ)

They are related in a sense that the loglinear models are more general than logit models, and some logit models are equivalent to certain loglinear models (e.g. consider the admissions data example or boys scout example).

  • if you have a binary response variable in the loglinear model, you can construct the logits to help with the interpretation of the loglinear model.
  • some logit models with only categorical variables have equivalent loglinear models

On the next slide we will consider the boys scout data and the homogeneous model (DS, BS, DB), and see once again how this ties in with the discussion in the Section B of Lesson 5.

Loglinear model is also equivalent to poisson regression model when all explanatory variables are discrete. For more on poisson regression models see the next section of this lesson, Agresti(2007), Sec. 3.3, Agresti (2002), Section 4.3 (for counts), Section 9.2 (for rates), and Section 13.2 (for random effects) and Agresti (1996), Section 4.3.

Loglinear model:

formula

If we focus on delinquent status

πij = Pr(Yes Delinquent | S = i, B = j)

and the logit model for boy's delinquent status is

formula

Compare this to model (4) in Section B of Lesson 5 , where β1, β2 are equivalent to βi for three levels of S, and β3 is equivalent to βj for two levels of B.

GLM Table based on Agresti (2002), pg. 118

Model
Random
Link
Systematic
Linear Regression
Normal
Identity
Continuous
ANOVA
Normal
Identity
Categorical
ANCOVA
Normal
Identity
Mixed
Logistic Regression
Binomial
Logit
Mixed
Loglinear
Poisson
Log
Categorical
Poisson Regression
Poisson
Log
Mixed
Multinomial response
Multinomial
Generalized Logit
Mixed

Advantage of GLM over Traditional Regression

  • We do not need to transform the response Y to have a normal distribution
  • The choice of link is separate from the choice of random component thus have more flexibility in modeling
  • If the link produces additive effects, then we do not need constant variance.
  • The models are fitted via Maximum Likelihood estimation; thus optimal properties of the estimators.
  • All the inference tools and model checking we discussed for logistic regression and loglinear models apply for other GLMs too; e.g., Wald and Likelihood ratio tests, Deviance, Residuals, Confidence intervals, Overdispersion.
  • Often one procedure in a software package, e.g. PROC GENMOD in SAS or glm() in R, etc... with options to vary the three components.

But there are some limitations too

  • Linear function, e.g. linear predictors
  • Responses must be independent

There are ways around these restrictions; e.g. consider our analysis of matched data, or use NLMIXED in SAS, or consider other models and alternative software packages.

Some additional references:

  • Collett, D (1991). Analysis of Binary Data.
  • Fey, M. (2002). Measuring a binary response's range of influence in logistic regression. American Statistician, 56, 5-9.
  • Hosmer, D.W. & Lemeshow, S. (1989). Applied Logistic Regression.
  • Fienberg, S.E. The Analysis of Cross-Classified Categorical Data. 2nd ed. Cambridge, MA
  • McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models. 2nd Ed.
  • Pregibon, D. (1981) Logistic Regression Diagnostics. Annals of Statistics, 9, 705-724.
  • Rice, J. C. (1994). "Logistic regression: An introduction". In B. Thompson, ed., Advances in social science methodology, Vol. 3: 191-245. Greenwich, CT: JAI Press. Popular introduction.
  • SAS Institute (1995). Logistic Regression Examples Using the SAS System, Version 6.
  • Strauss, David (1999). The Many faces of logistic regression. American Statistician.

Next we will see more on Poisson regression...