In Poisson regression, the response variable \(Y\) is an occurrence count recorded for a particular measurement window. Usually, this window is a length of time, but it can also be a distance, area, etc. For example, \(Y\) could count the number of flaws in a manufactured tabletop of a certain area. If the observations recorded correspond to different measurement windows, a scale adjustment has to be made to put them on equal terms, and we model the rate or count per measurement unit \(t\).
For the random component, we assume that the response \(Y\) has a Poisson distribution. That is, \(Y_i\sim Poisson(\mu_i)\), for \(i=1, \ldots, N\) where the expected count of \(Y_i\) is \(E(Y_i)=\mu_i\). The link function is usually the (natural) log, but sometimes the identity function may be used. The systematic component consists of a linear combination of explanatory variables \((\alpha+\beta_1x_1+\cdots+ \beta_kx_k\)); this is identical to that for logistic regression. Thus, in the case of a single explanatory, the model is written
This is equivalent to
\(\mu=\exp(\alpha+\beta x)=\exp(\alpha)\exp(\beta x)\).
Interpretations of these parameters are similar to those for logistic regression. \(\exp(\alpha)\) is the effect on the mean of \(Y\) when \(x = 0\), and \(\exp(\beta)\) is the multiplicative effect on the mean of \(Y\) for each 1-unit increase in \(x\).
- If \(\beta = 0\), then \(\exp(\beta) = 1\), and the expected count, \( \mu = E(Y) = \exp(\beta)\), and \(Y\) and \(x\) are not related.
- If \(\beta > 0\), then \(\exp(\beta) > 1\), and the expected count \( \mu = E(Y)\) is \(\exp(\beta)\) times larger than when \(x = 0\).
- If \(\beta< 0\), then \(\exp(\beta) < 1\), and the expected count \( \mu = E(Y)\) is \(\exp(\beta)\) times smaller than when \(x = 0\).
GLM Model for Rates Section
Compared with the model for count data above, we can alternatively model the expected rate of observations per unit of length, time, etc. to adjust for data collected over differently-sized measurement windows. For example, if \(Y\) is the count of flaws over a length of \(t\) units, then the expected value of the rate of flaws per unit is \(E(Y/t)=\mu/t\). For a single explanatory variable, the model would be written as
\(\log(\mu/t)=\log\mu-\log t=\alpha+\beta x\)
The term \(\log t\) is referred to as an offset. It is an adjustment term and a group of observations may have the same offset, or each individual may have a different value of \(t\). The term \(\log(t)\) is an observation, and it will change the value of the estimated counts:
\(\mu=\exp(\alpha+\beta x+\log(t))=(t) \exp(\alpha)\exp(\beta_x)\)
This means that the mean count is proportional to \(t\).
Parameter Estimation and Inference Section
Similar to the case of logistic regression, the maximum likelihood estimators (MLEs) for \(\beta_0, \beta_1 \dots \), etc.) are obtained by finding the values that maximize the log-likelihood. In general, there are no closed-form solutions, so the ML estimates are obtained by using iterative algorithms such as Newton-Raphson (NR), Iteratively re-weighted least squares (IRWLS), etc.
The usual tools from the basic statistical inference of GLMs are valid:
- Confidence Intervals and Hypothesis tests for parameters
- Wald statistics and asymptotic standard error (ASE)
- Likelihood ratio tests
- Distribution of probability estimates
In the next, we will take a look at an example using the Poisson regression model for count data with SAS and R. In SAS we can use PROC GENMOD which is a general procedure for fitting any GLM. Many parts of the input and output will be similar to what we saw with PROC LOGISTIC. In R we can still use
glm(). The response counts are recorded for the same measurement windows (horseshoe crabs), so no scale adjustment for modeling rates is necessary.