# T.3.3 - Generalized Linear Models

T.3.3 - Generalized Linear Models

All of the regression models we have considered (including multiple linear, logistic, and Poisson) belong to a family of models called generalized linear models. (In fact, a more "generalized" framework for regression models is called  general regression models, which includes any parametric regression model.)  Generalized linear models provide a generalization of ordinary least squares regression that relates the random term (the response Y) to the systematic term (the linear predictor $$\textbf{X}\beta$$) via a link function (denoted by $$g(\cdot)$$). Specifically, we have the relation

$$\begin{equation*} \mbox{E}(Y)=\mu=g^{-1}(\textbf{X}\beta), \end{equation*}$$

so $$g(\mu)=\textbf{X}\beta$$. Some common link functions are:

$$\begin{equation*} g(\mu)=\mu=\textbf{X}\beta, \end{equation*}$$

which is used in traditional linear regression.

\begin{align*} &g(\mu)=\log\biggl(\frac{\mu}{1-\mu}\biggr)=\textbf{X}\beta\\ &\Rightarrow\mu=\frac{e^{\textbf{X}\beta}}{1+e^{\textbf{X}\beta}}, \end{align*}

which is used in logistic regression.

\begin{align*} &g(\mu)=\log(\mu)=\textbf{X}\beta\\ &\Rightarrow\mu=e^{\textbf{X}\beta}, \end{align*}

which is used in Poisson regression.

\begin{align*} &g(\mu)=\Phi^{-1}(\mu)=\textbf{X}\beta\\ &\Rightarrow\mu=\Phi(\textbf{X}\beta), \end{align*}

where $$\Phi(\cdot)$$ is the cumulative distribution function of the standard normal distribution. This link function is also sometimes called the normit link. This also can be used in logistic regression.

\begin{align*} &g(\mu)=\log(-\log(1-\mu))=\textbf{X}\beta\\ &\Rightarrow\mu=1-\exp\{-e^{\textbf{X}\beta}\}, \end{align*}

which can also be used in logistic regression. This link function is also sometimes called the gompit link.

\begin{align*} &g(\mu)=\mu^{\lambda}=\textbf{X}\beta\\ &\Rightarrow\mu=(\textbf{X}\beta)^{1/\lambda}, \end{align*}

where $$\lambda\neq 0$$. This is used in other regressions which we do not explore (such as gamma regression and inverse Gaussian regression).

Also, the variance is typically a function of the mean and is often written as

$$\begin{equation*} \mbox{Var}(Y)=V(\mu)=V(g^{-1}(\textbf{X}\beta)). \end{equation*}$$

The random variable Y is assumed to belong to an exponential family distribution where the density can be expressed in the form

$$\begin{equation*} q(y;\theta,\phi)=\exp\biggl\{\dfrac{y\theta-b(\theta)}{a(\phi)}+c(y,\phi)\biggr\}, \end{equation*}$$

where $$a(\cdot)$$, $$b(\cdot)$$, and $$c(\cdot)$$ are specified functions, $$\theta$$ is a parameter related to the mean of the distribution, and $$\phi$$ is called the dispersion parameter. Many probability distributions belong to the exponential family. For example, the normal distribution is used for traditional linear regression, the binomial distribution is used for logistic regression, and the Poisson distribution is used for Poisson regression. Other exponential family distributions lead to gamma regression, inverse Gaussian (normal) regression, and negative binomial regression, just to name a few.

The unknown parameters, $$\beta$$, are typically estimated with maximum likelihood techniques (in particular, using iteratively reweighted least squares), Bayesian methods, or quasi-likelihood methods. The quasi-likelihood is a function that possesses similar properties to the log-likelihood function and is most often used with count or binary data. Specifically, for a realization y of the random variable Y, it is defined as

$$\begin{equation*} Q(\mu;y)=\int_{y}^{\mu}\dfrac{y-t}{\sigma^{2}V(t)}dt, \end{equation*}$$

where $$\sigma^{2}$$ is a scale parameter. There are also tests using likelihood ratio statistics for model development to determine if any predictors may be dropped from the model.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility