# Lesson 15: Logistic, Poisson & Nonlinear Regression

Lesson 15: Logistic, Poisson & Nonlinear Regression

## Overview

Multiple linear regression can be generalized to handle a response variable that is categorical or a count variable. This lesson covers the basics of such models, specifically logistic and Poisson regression, including model fitting and inference.

Multiple linear regression, logistic regression, and Poisson regression are examples of generalized linear models, which this lesson introduces briefly.

The lesson concludes with some examples of nonlinear regression, specifically exponential regression and population growth models.

## Objectives

Upon completion of this lesson, you should be able to:

• Apply logistic regression techniques to datasets with a binary response variable.
• Apply Poisson regression techniques to datasets with a count response variable.
• Understand the basics of fitting and inference for nonlinear regression methods when the regression function acting on the predictors is not linear in the parameters.

#### Lesson 15 Code Files

Below is a zip file that contains all the data sets used in this lesson:

STAT501_Lesson15.zip

• us_census.txt
• leukemia_remission.txt
• poisson_simulated.txt
• toxicity.txt

# 15.1 - Logistic Regression

15.1 - Logistic Regression

Logistic regression models a relationship between predictor variables and a categorical response variable. For example, we could use logistic regression to model the relationship between various measurements of a manufactured specimen (such as dimensions and chemical composition) to predict if a crack greater than 10 mils will occur (a binary variable: either yes or no). Logistic regression helps us estimate a probability of falling into a certain level of the categorical response given a set of predictors. We can choose from three types of logistic regression, depending on the nature of the categorical response variable:

Binary Logistic Regression:

Used when the response is binary (i.e., it has two possible outcomes). The cracking example given above would utilize binary logistic regression. Other examples of binary responses could include passing or failing a test, responding yes or no on a survey, and having high or low blood pressure.

Nominal Logistic Regression:

Used when there are three or more categories with no natural ordering to the levels. Examples of nominal responses could include departments at a business (e.g., marketing, sales, HR), type of search engine used (e.g., Google, Yahoo!, MSN), and color (black, red, blue, orange).

Ordinal Logistic Regression:

Used when there are three or more categories with a natural ordering to the levels, but the ranking of the levels do not necessarily mean the intervals between them are equal. Examples of ordinal responses could be how students rate the effectiveness of a college course on a scale of 1-5, levels of flavors for hot wings, and medical condition (e.g., good, stable, serious, critical).

Particular issues with modelling a categorical response variable include nonnormal error terms, nonconstant error variance, and constraints on the response function (i.e., the response is bounded between 0 and 1). We will investigate ways of dealing with these in the binary logistic regression setting here. There is some discussion of the nominal and ordinal logistic regression settings in Section 15.2.

The multiple binary logistic regression model is the following:

\begin{align}\label{logmod} \pi&=\dfrac{\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p-1}X_{p-1})}{1+\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p-1}X_{p-1})}\notag \\ & =\dfrac{\exp(\textbf{X}\beta)}{1+\exp(\textbf{X}\beta)}\\ & =\dfrac{1}{1+\exp(-\textbf{X}\beta)} \end{align}

where here $$\pi$$ denotes a probability and not the irrational number 3.14....

• $$\pi$$ is the probability that an observation is in a specified category of the binary Y variable, generally called the "success probability."
• Notice that the model describes the probability of an event happening as a function of X variables. For instance, it might provide estimates of the probability that an older person has heart disease.
• With the logistic model, estimates of $$\pi$$ from equations like the one above will always be between 0 and 1. The reasons are:
• The numerator $$\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p-1}X_{p-1})$$ must be positive, because it is a power of a positive value (e).
• The denominator of the model is (1 + numerator), so the answer will always be less than 1.
• With one X variable, the theoretical model for $$\pi$$ has an elongated "S" shape (or sigmoidal shape) with asymptotes at 0 and 1, although in sample estimates we may not see this "S" shape if the range of the X variable is limited.

For a sample of size n, the likelihood for a binary logistic regression is given by:

\begin{align*} L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\pi_{i}^{y_{i}}(1-\pi_{i})^{1-y_{i}}\\ & =\prod_{i=1}^{n}\biggl(\dfrac{\exp(\textbf{X}_{i}\beta)}{1+\exp(\textbf{X}_{i}\beta)}\biggr)^{y_{i}}\biggl(\dfrac{1}{1+\exp(\textbf{X}_{i}\beta)}\biggr)^{1-y_{i}}. \end{align*}

This yields the log likelihood:

\begin{align*} \ell(\beta)&=\sum_{i=1}^{n}(y_{i}\log(\pi_{i})+(1-y_{i})\log(1-\pi_{i}))\\ & =\sum_{i=1}^{n}(y_{i}\textbf{X}_{i}\beta-\log(1+\exp(\textbf{X}_{i}\beta))). \end{align*}

Maximizing the likelihood (or log likelihood) has no closed-form solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, $$\hat{\beta}$$.

To illustrate, consider data published on n = 27 leukemia patients. The Leukemia Remission data set has a response variable of whether leukemia remission occurred (REMISS), which is given by a 1.

The predictor variables are cellularity of the marrow clot section (CELL), smear differential percentage of blasts (SMEAR), percentage of absolute marrow leukemia cell infiltrate (INFIL), percentage labeling index of the bone marrow leukemia cells (LI), absolute number of blasts in the peripheral blood (BLAST), and the highest temperature prior to start of treatment (TEMP).

The following gives the estimated logistic regression equation and associated significance tests from Minitab:

• Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model.
• Select "REMISS" for the Response (the response event for remission is 1 for this data).
• Select all the predictors as Continuous predictors.
• Click Options and choose Deviance or Pearson residuals for diagnostic plots.
• Click Graphs and select "Residuals versus order."
• Click Results and change "Display of results" to "Expanded tables."
• Click Storage and select "Coefficients."

This results in the following output:

##### Coefficients
Term Coef SE Coef 95% CI Z-Value P-Value VIF
Constant 64.3 75.0 ( -82.7, 211.2) 0.86 0.391
CELL 30.8 52.1 ( -71.4, 133.0) 0.59 0.554 62.46
SMEAR 24.7 61.5 ( -95.9, 145.3) 0.40 0.688 434.42
INFILL -25.0 65.3 ( - 152.9, 103.0) -0.38 0.702 471.10
LI 4.36 2.66 ( -0.85, 9.57) 1.64 0.101 4.43
Blast -0.01 2.27 ( -4.45, 4.43) -0.01 0.996 4.18
TEMP -100.2 77.8 (-252.6, 52.2) -1.29 0.198 3.01

### Wald Test

The Wald test is the test of significance for individual regression coefficients in logistic regression (recall that we use t-tests in linear regression). For maximum likelihood estimates, the ratio

$$\begin{equation*} Z=\dfrac{\hat{\beta}_{i}}{\textrm{s.e.}(\hat{\beta}_{i})} \end{equation*}$$

can be used to test $$H_{0} \colon \beta_{i}=0$$. The standard normal curve is used to determine the $$p$$-value of the test. Furthermore, confidence intervals can be constructed as

$$\begin{equation*} \hat{\beta}_{i}\pm z_{1-\alpha/2}\textrm{s.e.}(\hat{\beta}_{i}). \end{equation*}$$

Estimates of the regression coefficients, $$\hat{\beta}$$, are given in the Minitab output Coefficients table in the column labeled "Coef." This table also gives coefficient p-values based on Wald tests. The index of the bone marrow leukemia cells (LI) has the smallest p-value and so appears to be closest to a significant predictor of remission occurring. After looking at various subsets of the data, we find that a good model is one which only includes the labeling index as a predictor:

##### Coefficients
Term Coef SE Coef 95% CI Z-Value P-Value VIF
Constant -3.78 1.38 ( -6.48, -1.08) -2.74 0.006
LI 2.90 1.19 ( -0.57, 5.22) 2.44 0.015 1.00
##### Regression Equation

p(1) = exp(Y')/(1 + exp(Y'))
Y' = -3.78 + 2.90 LI

Since we only have a single predictor in this model we can create a Binary Fitted Line Plot to visualize the sigmoidal shape of the fitted logistic regression curve:

### Odds, Log Odds, and Odds Ratio

There are algebraically equivalent ways to write the logistic regression model:

The first is

$$\label{logmod1} \dfrac{\pi}{1-\pi}=\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p-1}X_{p-1}),$$

which is an equation that describes the odds of being in the current category of interest. By definition, the odds for an event is $$\pi$$ / (1 - π) such that π is the probability of the event. For example, if you are at the racetrack and there is a 80% chance that a certain horse will win the race, then his odds are 0.80 / (1 - 0.80) = 4, or 4:1.

The second is

$$\label{logmod2} \log\biggl(\dfrac{\pi}{1-\pi}\biggr)=\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p-1}X_{p-1},$$

which states that the (natural) logarithm of the odds is a linear function of the X variables (and is often called the log odds). This is also referred to as the logit transformation of the probability of success, $$\pi$$.

The odds ratio (which we will write as $$\theta$$) between the odds for two sets of predictors (say $$\textbf{X}_{(1)}$$ and $$\textbf{X}_{(2)}$$) is given by

$$\begin{equation*} \theta=\dfrac{(\pi/(1-\pi))|_{\textbf{X}=\textbf{X}_{(1)}}}{(\pi/(1-\pi))|_{\textbf{X}=\textbf{X}_{(2)}}}. \end{equation*}$$

For binary logistic regression, the odds of success are:

$$\begin{equation*} \dfrac{\pi}{1-\pi}=\exp(\textbf{X}\beta). \end{equation*}$$

By plugging this into the formula for $$\theta$$ above and setting $$\textbf{X}_{(1)}$$ equal to $$\textbf{X}_{(2)}$$ except in one position (i.e., only one predictor differs by one unit), we can determine the relationship between that predictor and the response. The odds ratio can be any nonnegative number. An odds ratio of 1 serves as the baseline for comparison and indicates there is no association between the response and predictor. If the odds ratio is greater than 1, then the odds of success are higher for higher levels of a continuous predictor (or for the indicated level of a factor). In particular, the odds increase multiplicatively by $$\exp(\beta_{j})$$ for every one-unit increase in $$\textbf{X}_{j}$$. If the odds ratio is less than 1, then the odds of success are less for higher levels of a continuous predictor (or for the indicated level of a factor). Values farther from 1 represent stronger degrees of association.

For example, when there is just a single predictor, $$X$$, the odds of success are:

$$\begin{equation*} \dfrac{\pi}{1-\pi}=\exp(\beta_0+\beta_1X). \end{equation*}$$

If we increase $$X$$ by one unit, the odds ratio is

$$\begin{equation*} \theta=\dfrac{\exp(\beta_0+\beta_1(X+1))}{\exp(\beta_0+\beta_1X)}=\exp(\beta_1). \end{equation*}$$

To illustrate, the relevant Minitab output from the leukemia example is:

##### Odd Ratios for Continuous Predictors
Odds Ratio 95% CI
LI 18.1245 (1.7703, 185.5617)

The odds ratio for LI of 18.1245 is calculated as $$\exp(2.89726)$$ (you can view more decimal places for the coefficient estimates in Minitab by clicking "Storage" in the Regression Dialog and selecting "Coefficients"). The 95% confidence interval is calculated as $$\exp(2.89726\pm z_{0.975}*1.19)$$, where $$z_{0.975}=1.960$$ is the $$97.5^{\textrm{th}}$$ percentile from the standard normal distribution. The interpretation of the odds ratio is that for every increase of 1 unit in LI, the estimated odds of leukemia remission are multiplied by 18.1245. However, since the LI appears to fall between 0 and 2, it may make more sense to say that for every .1 unit increase in L1, the estimated odds of remission are multiplied by $$\exp(2.89726\times 0.1)=1.336$$. Then

• At LI=0.9, the estimated odds of leukemia remission is $$\exp\{-3.77714+2.89726*0.9\}=0.310$$.
• At LI=0.8, the estimated odds of leukemia remission is $$\exp\{-3.77714+2.89726*0.8\}=0.232$$.
• The resulting odds ratio is $$\dfrac{0.310}{0.232}=1.336$$, which is the ratio of the odds of remission when LI=0.9 compared to the odds when L1=0.8.

Notice that $$1.336\times 0.232=0.310$$, which demonstrates the multiplicative effect by $$\exp(0.1\hat{\beta_{1}})$$ on the odds.

### Likelihood Ratio (or Deviance) Test

The likelihood ratio test is used to test the null hypothesis that any subset of the $$\beta$$'s is equal to 0. The number of $$\beta$$'s in the full model is p, while the number of $$\beta$$'s in the reduced model is r. (Remember the reduced model is the model that results when the $$\beta$$'s in the null hypothesis are set to 0.) Thus, the number of $$\beta$$'s being tested in the null hypothesis is $$p-r$$. Then the likelihood ratio test statistic is given by:

$$\begin{equation*} \Lambda^{*}=-2(\ell(\hat{\beta}^{(0)})-\ell(\hat{\beta})), \end{equation*}$$

where $$\ell(\hat{\beta})$$ is the log likelihood of the fitted (full) model and $$\ell(\hat{\beta}^{(0)})$$ is the log likelihood of the (reduced) model specified by the null hypothesis evaluated at the maximum likelihood estimate of that reduced model. This test statistic has a $$\chi^{2}$$ distribution with $$p-r$$ degrees of freedom. Minitab presents results for this test in terms of "deviance," which is defined as $$-2$$ times log-likelihood. The notation used for the test statistic is typically $$G^2$$ = deviance (reduced) – deviance (full).

This test procedure is analogous to the general linear F test procedure for multiple linear regression. However, note that when testing a single coefficient, the Wald test and likelihood ratio test will not, in general, give identical results.

To illustrate, the relevant software output from the leukemia example is:

##### Deviance Table
Regression 1 8.299 8.299 8.30 0.004
LI 1 8.299 8.299 8.30 0.004
Error 25 26.073 1.043
Total 26 34.372

Since there is only a single predictor for this example, this table simply provides information on the likelihood ratio test for LI (p-value of 0.004), which is similar but not identical to the earlier Wald test result (p-value of 0.015). The Deviance Table includes the following:

• The null (reduced) model, in this case, has no predictors, so the fitted probabilities are simply the sample proportion of successes, $$9/27=0.333333$$. The log-likelihood for the null model is $$\ell(\hat{\beta}^{(0)})=-17.1859$$, so the deviance for the null model is $$-2\times-17.1859=34.372$$, which is shown in the "Total" row in the Deviance Table.
• The log-likelihood for the fitted (full) model is $$\ell(\hat{\beta})=-13.0365$$, so the deviance for the fitted model is $$-2\times-13.0365=26.073$$, which is shown in the "Error" row in the Deviance Table.
• The likelihood ratio test statistic is therefore $$\Lambda^{*}=-2(-17.1859-(-13.0365))=8.299$$, which is the same as $$G^2=34.372-26.073=8.299$$.
• The p-value comes from a $$\chi^{2}$$ distribution with $$2-1=1$$ degrees of freedom.

When using the likelihood ratio (or deviance) test for more than one regression coefficient, we can first fit the "full" model to find deviance (full), which is shown in the "Error" row in the resulting full model Deviance Table. Then fit the "reduced" model (corresponding to the model that results if the null hypothesis is true) to find deviance (reduced), which is shown in the "Error" row in the resulting reduced model Deviance Table. For example, the relevant Deviance Tables for the Disease Outbreak example on pages 581-582 of Applied Linear Regression Models (4th ed) by Kutner et al are:

##### Full model
Regression 9 28.322 3.14686 28.32 0.001
Error 88 93.996 1.06813
Total 97 122.318
##### Reduced Model
Regression 4 21.263 5.3159 21.26 0.000
Error 93 101.054 1.06813
Total 97 122.318

Here the full model includes four single-factor predictor terms and five two-factor interaction terms, while the reduced model excludes the interaction terms. The test statistic for testing the interaction terms is $$G^2 = 101.054-93.996 = 7.058$$, which is compared to a chi-square distribution with $$10-5=5$$ degrees of freedom to find the p-value > 0.05 (meaning the interaction terms are not significant).

Alternatively, select the corresponding predictor terms last in the full model and request Minitab to output Sequential (Type I) Deviances. Then add the corresponding Sequential Deviances in the resulting Deviance Table to calculate $$G^2$$. For example, the relevant Deviance Table for the Disease Outbreak example is:

Source DF Seq Dev Seq Mean Chi-Square P-Value
Regression 9 28.322 3.1469 28.32 0.001
--Age 1 7.405 7.4050 7.40 0.007
--Middle 1 1.804 1.8040 1.80 0.179
--Lower 1 1.606 1.6064 1.61 0.205
Sector 1 10.448 10.4481 10.45 0.001
Age*Middle 1 4.570 4.5397 4.57 0.033
Age*Lower 1 1.015 1.0152 1.02 0.314
Age*Sector 1 1.120 1.1202 1.12 0.290
Middle*Sector 1 0.000 0.0001 0.00 0.993
Lower*Sector 1 0.353 0.3531 0.35 0.3552
Error 88 93.996 1.0681
Total 97 122.318

The test statistic for testing the interaction terms is $$G^2 = 4.570+1.015+1.120+0.000+0.353 = 7.058$$, the same as in the first calculation.

### Goodness-of-Fit Tests

Overall performance of the fitted model can be measured by several different goodness-of-fit tests. Two tests that require replicated data (multiple observations with the same values for all the predictors) are the Pearson chi-square goodness-of-fit test and the deviance goodness-of-fit test(analagous to the multiple linear regression lack-of-fit F-test). Both of these tests have statistics that are approximately chi-square distributed with c - p degrees of freedom, where c is the number of distinct combinations of the predictor variables. When a test is rejected, there is a statistically significant lack of fit. Otherwise, there is no evidence of lack of fit.

By contrast, the Hosmer-Lemeshow goodness-of-fit test is useful for unreplicated datasets or for datasets that contain just a few replicated observations. For this test the observations are grouped based on their estimated probabilities. The resulting test statistic is approximately chi-square distributed with c - 2 degrees of freedom, where c is the number of groups (generally chosen to be between 5 and 10, depending on the sample size).

To illustrate, the relevant Minitab output from the leukemia example is:

##### Goodness-of-Fit Test
Test DF Chi-square P-Value
Deviance 25 26.07 0.404
Pearson 25 23.93 0.523
Hosmer-Lemeshow 7 6.87 0.442

Since there is no replicated data for this example, the deviance and Pearson goodness-of-fit tests are invalid, so the first two rows of this table should be ignored. However, the Hosmer-Lemeshow test does not require replicated data so we can interpret its high p-value as indicating no evidence of lack-of-fit.

#### $$R^{2}$$

The calculation of $$R^{2}$$ used in linear regression does not extend directly to logistic regression. One version of $$R^{2}$$ used in logistic regression is defined as

$$\begin{equation*} R^{2}=\dfrac{\ell(\hat{\beta_{0}})-\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})-\ell_{S}(\beta)}, \end{equation*}$$

where $$\ell(\hat{\beta_{0}})$$ is the log likelihood of the model when only the intercept is included and $$\ell_{S}(\beta)$$ is the log likelihood of the saturated model (i.e., where a model is fit perfectly to the data). This $$R^{2}$$ does go from 0 to 1 with 1 being a perfect fit. With unreplicated data, $$\ell_{S}(\beta)=0$$, so the formula simplifies to:

$$\begin{equation*} R^{2}=\dfrac{\ell(\hat{\beta_{0}})-\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})}=1-\dfrac{\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})}. \end{equation*}$$

To illustrate, the relevant Minitab output from the leukemia example is:

##### Model Summary
Deviance
R-Sq
Deviance
AIC
24.14% 21.23% 30.07

Recall from above that $$\ell(\hat{\beta})=-13.0365$$ and $$\ell(\hat{\beta}^{(0)})=-17.1859$$, so:

$$\begin{equation*} R^{2}=1-\dfrac{-13.0365}{-17.1859}=0.2414. \end{equation*}$$

Note that we can obtain the same result by simply using deviances instead of log-likelihoods since the $$-2$$ factor cancels out:

$$\begin{equation*} R^{2}=1-\dfrac{26.073}{34.372}=0.2414. \end{equation*}$$

#### Raw Residual

The raw residual is the difference between the actual response and the estimated probability from the model. The formula for the raw residual is

$$\begin{equation*} r_{i}=y_{i}-\hat{\pi}_{i}. \end{equation*}$$

#### Pearson Residual

The Pearson residual corrects for the unequal variance in the raw residuals by dividing by the standard deviation. The formula for the Pearson residuals is

$$\begin{equation*} p_{i}=\dfrac{r_{i}}{\sqrt{\hat{\pi}_{i}(1-\hat{\pi}_{i})}}. \end{equation*}$$

#### Deviance Residuals

Deviance residuals are also popular because the sum of squares of these residuals is the deviance statistic. The formula for the deviance residual is

$$\begin{equation*} d_{i}=\pm\sqrt{2\biggl(y_{i}\log\biggl(\dfrac{y_{i}}{\hat{\pi}_{i}}\biggr)+(1-y_{i})\log\biggl(\dfrac{1-y_{i}}{1-\hat{\pi}_{i}}\biggr)\biggr)}. \end{equation*}$$

Here are the plots of the Pearson residuals and deviance residuals for the leukemia example. There are no alarming patterns in these plots to suggest a major problem with the model.

#### Hat Values

The hat matrix serves a similar purpose as in the case of linear regression – to measure the influence of each observation on the overall fit of the model – but the interpretation is not as clear due to its more complicated form. The hat values (leverages) are given by

$$\begin{equation*} h_{i,i}=\hat{\pi}_{i}(1-\hat{\pi}_{i})\textbf{x}_{i}^{\textrm{T}}(\textbf{X}^{\textrm{T}}\textbf{W}\textbf{X})\textbf{x}_{i}, \end{equation*}$$

where W is an $$n\times n$$ diagonal matrix with the values of $$\hat{\pi}_{i}(1-\hat{\pi}_{i})$$ for $$i=1 ,\ldots,n$$ on the diagonal. As before, we should investigate any observations with $$h_{i,i}>3p/n$$ or, failing this, any observations with $$h_{i,i}>2p/n$$ and very isolated.

#### Studentized Residuals

We can also report Studentized versions of some of the earlier residuals. The Studentized Pearson residuals are given by

$$\begin{equation*} sp_{i}=\dfrac{p_{i}}{\sqrt{1-h_{i,i}}} \end{equation*}$$

and the Studentized deviance residuals are given by

$$\begin{equation*} sd_{i}=\dfrac{d_{i}}{\sqrt{1-h_{i, i}}}. \end{equation*}$$

#### C and $$\bar{\textrm{C}}$$

C and $$\bar{\textrm{C}}$$ are extensions of Cook's distance for logistic regression. C measures the overall change in fitted logits due to deleting the $$i^{\textrm{th}}$$ observation for all points including the one deleted, while $$\bar{\textrm{C}}$$ excludes the deleted point. They are defined by:

$$\begin{equation*} \textrm{C}_{i}=\dfrac{p_{i}^{2}h _{i,i}}{p(1-h_{i,i})^{2}} \end{equation*}$$

and

$$\begin{equation*} \bar{\textrm{C}}_{i}=\dfrac{p_{i}^{2}h_{i,i}}{p(1-h_{i,i})}. \end{equation*}$$

To illustrate, the relevant Minitab output from the leukemia example is:

##### Fits and Diagnostics for Unusual Observations
Obs Probability Fit SE Fit 95% CI Resid Std Resid Del Resid HI
8 0.000 0.849 0.139 (0.403, 0.979) -1.945 -2.11 -2.19 0.149840
##### Fits and Diagnostics for Unusual Observations
Obs Cook's D DFITS
8 0.58 -1.08011 R

R Large residual

The default residuals in this output (set under Minitab's Regression Options) are deviance residuals, so observation 8 has a deviance residual of $$-1.945$$, a studentized deviance residual of $$-2.11$$, a leverage (h) of $$0.149840$$, and a Cook's distance (C) of 0.58.

#### DFDEV and DFCHI

DFDEV and DFCHI are statistics that measure the change in deviance and in Pearson's chi-square, respectively, that occurs when an observation is deleted from the data set. Large values of these statistics indicate observations that have not been fitted well. The formulas for these statistics are

$$\begin{equation*} \textrm{DFDEV}_{i}=d_{i}^{2}+\bar{\textrm{C}}_{i} \end{equation*}$$

and

$$\begin{equation*} \textrm{DFCHI}_{i}=\dfrac{\bar{\textrm{C}}_{i}}{h_{i,i}} \end{equation*}$$

# 15.2 - Polytomous Regression

15.2 - Polytomous Regression
Note! that the material in this section is more technical than preceding Lessons. It is offered as an introduction to more advanced topics and, given the technical nature of the material, it could be considered optional in the context of this course.

In binary logistic regression, we only had two possible outcomes. For polytomous logistic regression, we will consider the possibility of having k > 2 possible outcomes. (Note: The word polychotomous is sometimes used, but note that this is not actually a word!)

### Nominal Logistic Regression

The multiple nominal logistic regression model (sometimes called the multinomial logistic regression model) is given by the following:

$$\label{nommod} \pi_{j}=\left\{ \begin{array}{ll} \dfrac{\exp(\textbf{X}\beta_{j})}{1+\sum_{j=2}^{k}\exp(\textbf{X}\beta_{j})} & \hbox{j=2,...,k} \\ \\ \dfrac{1}{1+\sum_{j=2}^{k}\exp(\textbf{X}\beta_{j})} & \hbox{j=1} \end{array} \right.$$

where again $$\pi_{j}$$ denotes a probability and not the irrational number. Notice that k - 1 of the groups have their own set of $$\beta$$ values. Furthermore, since $$\sum_{j=1}^{k}\pi_{j}=1$$, we set the $$\beta$$ values for group 1 to be 0 (this is what we call the reference group). Notice that when k = 2, we are back to binary logistic regression.

$$\pi_{j}$$ is the probability that an observation is in one of k categories. The likelihood for the nominal logistic regression model is given by:

\begin{align*} L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\prod_{j=1}^{k}\pi_{i,j}^{y_{i,j}}(1-\pi_{i,j})^{1-y_{i,j}}, \end{align*}

where the subscript $$(i,j)$$ means the $$i^{\textrm{th}}$$ observation belongs to the $$j^{\textrm{th}}$$ group. This yields the log likelihood:

$$\begin{equation*} \ell(\beta)=\sum_{i=1}^{n}\sum_{j=1}^{k}y_{i,j}\pi_{i,j}. \end{equation*}$$

Maximizing the likelihood (or log likelihood) has no closed-form solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, $$\hat{\beta}$$.

An odds ratio ($$\theta$$) of 1 serves as the baseline for comparison. If $$\theta=1$$, then there is no association between the response and predictor. If $$\theta>1$$, then the odds of success are higher for the indicated level of the factor (or for higher levels of a continuous predictor). If $$\theta<1$$, then the odds of success are less for the indicated level of the factor (or for higher levels of a continuous predictor). Values farther from 1 represent stronger degrees of association. For nominal logistic regression, the odds of success (at two different levels of the predictors, say $$\textbf{X}_{(1)}$$ and $$\textbf{X}_{(2)}$$) are:

$$\begin{equation*} \theta=\dfrac{(\pi_{j}/\pi_{1})|_{\textbf{X}=\textbf{X}_{(1)}}}{(\pi_{j}/\pi_{1})|_{\textbf{X}=\textbf{X}_{(2)}}}. \end{equation*}$$

Many of the procedures discussed in binary logistic regression can be extended to nominal logistic regression with the appropriate modifications.

### Ordinal Logistic Regression

For ordinal logistic regression, we again consider k possible outcomes as in nominal logistic regression, except that the order matters. The multiple ordinal logistic regression model is the following:

$$\label{ordmod} \sum_{j=1}^{k^{*}}\pi_{j}=\dfrac{\exp(\beta_{0,k^{*}}+\textbf{X}\beta)}{1+\exp(\beta_{0,k^{*}}+\textbf{X}\beta)}$$

such that $$k^{*}\leq k$$, $$\pi_{1}\leq\pi_{2},\leq \ldots,\leq\pi_{k}$$, and again $$\pi_{j}$$ denotes a probability. Notice that this model is a cumulative sum of probabilities which involves just changing the intercept of the linear regression portion (so $$\beta$$ is now (p - 1)-dimensional and X is $$n\times(p-1)$$ such that first column of this matrix is not a column of 1's). Also, it still holds that $$\sum_{j=1}^{k}\pi_{j}=1$$.

$$\pi_{j}$$ is still the probability that an observation is in one of k categories, but we are constrained by the model written in the equation above. The likelihood for the ordinal logistic regression model is given by:

\begin{align*} L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\prod_{j=1}^{k}\pi_{i,j}^{y_{i,j}}(1-\pi_{i,j})^{1-y_{i,j}}, \end{align*}

where the subscript (i, j) means the $$i^{\textrm{th}}$$ observation belongs to the $$j^{\textrm{th}}$$ group. This yields the log likelihood:

$$\begin{equation*} \ell(\beta)=\sum_{i=1}^{n}\sum_{j=1}^{k}y_{i,j}\pi_{i,j}. \end{equation*}$$

Notice that this is identical to the nominal logistic regression likelihood. Thus, maximization again has no closed-form solution, so we defer to a procedure like iteratively reweighted least squares.

For ordinal logistic regression, a proportional odds model is used to determine the odds ratio. Again, an odds ratio ($$\theta$$) of 1 serves as the baseline for comparison between the two predictor levels, say $$\textbf{X}_{(1)}$$ and $$\textbf{X}_{(2)}$$. Only one parameter and one odds ratio is calculated for each predictor. Suppose we are interested in calculating the odds of $$\textbf{X}_{(1)}$$ to $$\textbf{X}_{(2)}$$. If $$\theta=1$$, then there is no association between the response and these two predictors. If $$\theta>1$$, then the odds of success are higher for the predictor $$\textbf{X}_{(1)}$$. If $$\theta<1$$, then the odds of success are less for the predictor $$\textbf{X}_{(1)}$$. Values farther from 1 represent stronger degrees of association. For ordinal logistic regression, the odds ratio utilizes cumulative probabilities and their complements and is given by:

$$\begin{equation*} \theta=\dfrac{\sum_{j=1}^{k^{*}}\pi_{j}|_{\textbf{X}=\textbf{X}_{(1)}}/(1-\sum_{j=1}^{k^{*}}\pi_{j})|_{\textbf{X}=\textbf{X}_{(1)}}}{\sum_{j=1}^{k^{*}}\pi_{j}|_{\textbf{X}=\textbf{X}_{(2)}}/(1\sum_{j=1}^{k^{*}}\pi_{j})|_{\textbf{X}=\textbf{X}_{(2)}}}. \end{equation*}$$

# 15.3 - Further Logistic Regression Examples

15.3 - Further Logistic Regression Examples

## Example 15-1: STAT 200 Dataset

Students in STAT 200 at Penn State were asked if they have ever driven after drinking (dataset unfortunately no longer available). They also were asked, “How many days per month do you drink at least two beers?” In the following discussion, $$\pi$$ = the probability a student says “yes” they have driven after drinking. This is modeled using X = days per month of drinking two beers. Results from Minitab were as follows.

Variable Value Count
DrivDrnk Yes 122 (Event)
No 127
Total 249

          95% CI Coef -1.5514 0.2661 -5.83 0.000 0.19031 0.02946 6.46 0.000 1.21 1.14 1.28

Some things to note from the results are:

• We see that in the sample 122/249 students said they have driven after drinking. (Yikes!)
• Parameter estimates, given under Coef are $$\hat{\beta}_0$$ = −1.5514, and $$\hat{\beta}_1$$ = 0.19031.
• The model for estimating $$\pi$$ = the probability of ever having driven after drinking is

$$\hat{\pi}=\dfrac{\exp(-1.5514+0.19031X)}{1+\exp(-1.5514+0.19031X)}$$

• The variable X = DaysBeer is a statistically significant predictor (Z = 6.46, P = 0.000).

We can also obtain a plot of the estimated probability of ever having driven under the influence ($$\pi$$) versus days per month of drinking at least two beers.

The vertical axis shows the probability of ever having driven after drinking. For example, if X = 4 days per month of drinking beer, then the estimated probability is calculated as:

$$\hat{\pi}=\dfrac{\exp(-1.5514+0.19031(4))}{1+\exp(-1.5514+0.19031(4))}=\frac{\exp(-0.79016)}{1+\exp(-0.79016)}=0.312$$

A few of these estimated probabilities are given in the following table:

 DaysBeer $$\hat{\pi}$$ 4 12 20 28 0.312 0.675 0.905 0.97

In the results given above, we see that the estimate of the odds ratio is 1.21 for DaysBeer. This is given under Odds Ratio in the table of coefficients, standard errors and so on. The sample odds ratio was calculated as $$e^{0.19031}$$. The interpretation of the odds ratio is that for each increase of one day of drinking beer per month, the predicted odds of having ever driven after drinking are multiplied by 1.21.

Above we found that at X = 4, the predicted probability of ever driving after drinking is $$\hat{\pi}$$ = 0.312. Thus when X = 4, the predicted odds of ever driving after drinking is 0.312/(1 − 0.312) = 0.453. To find the odds when X = 5, one method would be to multiply the odds at X = 4 by the sample odds ratio. The calculation is 1.21 × 0.453 = 0.549. (Another method is to just do the calculation as we did for X = 4.)

Notice also, that the results give a 95% confidence interval estimate of the odd ratio (1.14 to 1.28).

We now include Gender (male or female) as an x-variable (along with DaysBeer). Some Minitab results are given below. Under Gender, the row for male is explaining that the program created an indicator variable with a value of 1 if the student is male and a value of 0 if the student is female.

          95% CI Coef -1.7736 0.2945 -6.02 0.000 0.18693 0.03004 6.22 0.000 1.21 1.14 1.28 0.6172 0.2954 2.09 0.037 1.85 1.04 3.31

Some things to note from the results are:

• The p-values are less than 0.05 for both DaysBeer and Gender. This is evidence that both x-variables are useful for predicting the probability of ever having driven after drinking.
• For DaysBeer, the odds ratio is still estimated to equal 1.21 to two decimal places (calculated as $$e^{0.18693}$$).
• For Gender, the odds ratio is 1.85 (calculated as $$e^{0.6172}$$). For males, the odds of ever having driven after drinking is 1.85 times the odds for females, assuming DaysBeer is held constant.

Finally, the results for testing with respect to the multiple logistic regression model are as follows:

Log-Likelihood = -139.981

Test that all slopes are zero: G = 65.125, DF = 2, P-Value = 0.000

Notice that since we have a p-value of 0.000 for this chi-square test, we therefore reject the null hypothesis that all of the slopes are equal to 0.

## Example 15-2: Toxicity Dataset

An experiment is done to test the effect of a toxic substance on insects. The data originate from the textbook, Applied Linear Statistical Models by Kutner, Nachtsheim, Neter, & Li.

At each of six dose levels, 250 insects are exposed to the substance and the number of insects that die is counted (Toxicity data). We can use Minitab to calculate the observed probabilities as the number of observed deaths out of 250 for each dose level.

A binary logistic regression model is used to describe the connection between the observed probabilities of death as a function of dose level. Since the data is in event/trial format the procedure in Minitab is a little different to before:

• Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model
• Select "Response is in event/trial format"
• Select "Deaths" for Number of events, "SampSize" for Number of trials (and type "Death" for Event name if you like)
• Select Dose as a Continuous predictor
• Click Results and change "Display of results" to "Expanded tables"
• Click Storage and select "Fits (event probabilities)"

The Minitab output is as follows:

##### Coefficients
Term Coef SE Coef 95% CI Z P VIF
Constant -2.644 0.156 (-2.950, -2.338) -16.94 0.000
Dose 0.6740 0.0391 (0.5973, 0.7506) 17.23 0.000 1.00
##### Odds Ratios for Continuous Predictors
Term Odds Ratio 95% CI
Dose 1.9621 (1.8173, 2.1184)

Thus

$$\hat{\pi}=\dfrac{\exp(-2.644+0.674X)}{1+\exp(-2.644+0.674X)}$$

where X = Dose and $$\hat{\pi}$$ is the estimated probability the insect dies (based on the model).

Predicted probabilities of death (based on the logistic model) for the six dose levels are given below (FITS). These probabilities closely agree with the observed values (Observed p) reported.

##### Data Display
Row Dose SampSize  Deaths Observed P FITS
1 1 250 28 0.112 0.122423
2 2 250 53 0.212 0.214891
3 3 250 93 0.372 0.349396
4 4 250 126 0.504 0.513071
5 5 250 172 0.688 0.673990
6 6 250 197 0.788 0.802229

As an example of calculating the estimated probabilities, for Dose 1, we have

$$\hat{\pi}=\dfrac{\exp(-2.644+0.674(1))}{1+\exp(-2.644+0.674(1))}=0.1224$$

The odds ratio for Dose is 1.9621, the value under Odds Ratio in the output. It was calculated as $$e^{0.674}$$. The interpretation of the odds ratio is that for every increase of 1 unit in dose level, the estimated odds of insect death are multiplied by 1.9621.

As an example of odds and odds ratio:

• At Dose = 1, the estimated odds of death is $$\hat{\pi}/(1− \hat{\pi})$$ = 0.1224/(1−0.1224) = 0.1395.
• At Dose = 2, the estimated odds of death is $$\hat{\pi}/(1− \hat{\pi})$$ = 0.2149/(1−0.2149) = 0.2737.
• The Odds Ratio = $$\dfrac{0.2737}{0.1395}$$, which is the ratio of the odds of death when Dose = 2 compared to the odds when Dose = 1.

A property of the binary logistic regression model is that the odds ratio is the same for any increase of one unit in X, regardless of the specific values of X.

# 15.4 - Poisson Regression

15.4 - Poisson Regression

The Poisson distribution for a random variable Y has the following probability mass function for a given value Y = y:

$$\begin{equation*} \mbox{P}(Y=y|\lambda)=\dfrac{e^{-\lambda}\lambda^{y}}{y!}, \end{equation*}$$

for $$y=0,1,2,\ldots$$. Notice that the Poisson distribution is characterized by the single parameter $$\lambda$$, which is the mean rate of occurrence for the event being measured. For the Poisson distribution, it is assumed that large counts (with respect to the value of $$\lambda$$) are rare.

In Poisson regression the dependent variable (Y) is an observed count that follows the Poisson distribution. The rate $$\lambda$$ is determined by a set of $$p-1$$ predictors $$\textbf{X}=(X_{1},\ldots,X_{p-1})$$. The expression relating these quantities is

$$\begin{equation*} \lambda=\exp\{\textbf{X}\beta\}. \end{equation*}$$

Thus, the fundamental Poisson regression model for observation i is given by

$$\begin{equation*} \mbox{P}(Y_{i}=y_{i}|\textbf{X}_{i},\beta)=\dfrac{e^{-\exp\{\textbf{X}_{i}\beta\}}\exp\{\textbf{X}_{i}\beta\}^{y_{i}}}{y_{i}!}. \end{equation*}$$

That is, for a given set of predictors, the categorical outcome follows a Poisson distribution with rate $\exp\{\textbf{X}\beta\}$. For a sample of size n, the likelihood for a Poisson regression is given by:

$$\begin{equation*} L(\beta;\textbf{y},\textbf{X})=\prod_{i=1}^{n}\dfrac{e^{-\exp\{\textbf{X}_{i}\beta\}}\exp\{\textbf{X}_{i}\beta\}^{y_{i}}}{y_{i}!}. \end{equation*}$$

This yields the log likelihood:

$$\begin{equation*} \ell(\beta)=\sum_{i=1}^{n}y_{i}\textbf{X}_{i}\beta-\sum_{i=1}^{n}\exp\{\textbf{X}_{i}\beta\}-\sum_{i=1}^{n}\log(y_{i}!). \end{equation*}$$

Maximizing the likelihood (or log likelihood) has no closed-form solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, $$\hat{\beta}$$. Once this value of $$\hat{\beta}$$ has been obtained, we may proceed to define various goodness-of-fit measures and calculated residuals. For the residuals we present, they serve the same purpose as in linear regression. When plotted versus the response, they will help identify suspect data points.

To illustrate consider this example (Poisson Simulated data), which consists of a simulated data set of size n = 30 such that the response (Y) follows a Poisson distribution with rate $\lambda=\exp\{0.50+0.07X\}$. A plot of the response versus the predictor is given below.

The following gives the analysis of the Poisson regression data in Minitab:

• Select Stat > Regression > Poisson Regression > Fit Poisson Model.
• Select "y" for the Response.
• Select "x" as a Continuous predictor.
• Click Results and change "Display of results" to "Expanded tables."

This results in the following output:

##### Coefficients
Term Coef SE Coef 95% CI Z-Value P-Value VIF
Constant 0.308 0.289 (-0.259, 0.875) 1.06 0.287
x 0.0764 0.0173 (0.0424, 0.1103) 4.41 0.000 1.00
##### Regression Equation
 y = exp(Y') Y' = 0.308 + 0.0764 x

As you can see, the Wald test p-value for x of 0.000 indicates that the predictor is highly significant.

### Deviance Test

Changes in the deviance can be used to test the null hypothesis that any subset of the $$\beta$$'s is equal to 0. The deviance, $$D(\hat{\beta})$$, is $$-2$$ times the difference between the log-likelihood evaluated at the maximum likelihood estimate and the log-likelihood for a "saturated model" (a theoretical model with a separate parameter for each observation and thus a perfect fit). Suppose we test that r < p of the $\beta$'s are equal to 0. Then the deviance test statistic is given by:

$$\begin{equation*} G^2=D(\hat{\beta}^{(0)})-D(\hat{\beta}), \end{equation*}$$

where $$D(\hat{\beta})$$ is the deviance of the fitted (full) model and $$D(\hat{\beta}^{(0)})$$ is the deviance of the model specified by the null hypothesis evaluated at the maximum likelihood estimate of that reduced model. This test statistic has a $$\chi^{2}$$ distribution with $$p-r$$ degrees of freedom. This test procedure is analagous to the general linear F test procedure for multiple linear regression.

To illustrate, the relevant Minitab output from the simulated example is:

##### Deviance Table
Regression 1 20.47 42.37% 20.47 20.4677 20.47 0.000
x 1 20.47 42.37% 20.47 20.4677 20.47 0.000
Error 28 27.84 57.63% 27.84 0.9944
Total 29 48.31 100.00%

Since there is only a single predictor for this example, this table simply provides information on the deviance test for x (p-value of 0.000), which matches the earlier Wald test result (p-value of 0.000). (Note that the Wald test and deviance test will not in general give identical results.) The Deviance Table includes the following:

• The null model in this case has no predictors, so the fitted values are simply the sample mean, $$4.233$$. The deviance for the null model is $$D(\hat{\beta}^{(0)})=48.31$$, which is shown in the "Total" row in the Deviance Table.
• The deviance for the fitted model is $$D(\hat{\beta})=27.84$$, which is shown in the "Error" row in the Deviance Table.
• The deviance test statistic is therefore $$G^2=48.31-27.84=20.47$$.
• The p-value comes from a $\chi^{2}$ distribution with $$2-1=1$$ degrees of freedom.

### Goodness-of-Fit

Overall performance of the fitted model can be measured by two different chi-square tests. There is the Pearson statistic

$$\begin{equation*} X^2=\sum_{i=1}^{n}\dfrac{(y_{i}-\exp\{\textbf{X}_{i}\hat{\beta}\})^{2}}{\exp\{\textbf{X}_{i}\hat{\beta}\}} \end{equation*}$$

and the deviance statistic

$$\begin{equation*} D=2\sum_{i=1}^{n}\biggl(y_{i}\log\biggl(\dfrac{y_{i}}{\exp\{\textbf{X}_{i}\hat{\beta}\}}\biggr)-(y_{i}-\exp\{\textbf{X}_{i}\hat{\beta}\})\biggr). \end{equation*}$$

Both of these statistics are approximately chi-square distributed with n - p degrees of freedom. When a test is rejected, there is a statistically significant lack of fit. Otherwise, there is no evidence of lack-of-fit.

To illustrate, the relevant Minitab output from the simulated example is:

##### Goodness-of-Fit
Test DF Estimate Mean Chi-Square P-Value
Deviance 28 27.84209 0.099436 27.84 0.473
Pearson 28 26.09324 0.93190 26.09 0.568

The high p-values indicate no evidence of lack-of-fit.

Overdispersion means that the actual covariance matrix for the observed data exceeds that for the specified model for $$Y|\textbf{X}$$. For a Poisson distribution, the mean and the variance are equal. In practice, the data almost never reflects this fact and we have overdispersion in the Poisson regression model if (as is often the case) the variance is greater than the mean. In addition to testing goodness-of-fit, the Pearson statistic can also be used as a test of overdispersion. Note that overdispersion can also be measured in the logistic regression models that were discussed earlier.

#### Pseudo $$R^{2}$$

The value of $$R^{2}$$ used in linear regression also does not extend to Poisson regression. One commonly used measure is the pseudo $$R^{2}$$, defined as

$$\begin{equation*} R^{2}=\dfrac{\ell(\hat{\beta_{0}})-\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})}=1-\dfrac{D(\hat{\beta})}{D(\hat{\beta_{0}})}, \end{equation*}$$

where $$\ell(\hat{\beta_{0}})$$ is the log likelihood of the model when only the intercept is included. The pseudo $$R^{2}$$ goes from 0 to 1 with 1 being a perfect fit.

To illustrate, the relevant Minitab output from the simulated example is:

##### Model Summary
Deviance
R-Sq
Deviance
AIC
42.37% 40.30% 124.50

Recall from above that $$D(\hat{\beta})=27.84$$ and $$D(\hat{\beta}^{(0)})=48.31$$, so:

$$\begin{equation*} R^{2}=1-\dfrac{27.84}{48.31}=0.4237. \end{equation*}$$

#### Raw Residual

The raw residual is the difference between the actual response and the estimated value from the model. Remember that the variance is equal to the mean for a Poisson random variable. Therefore, we expect that the variances of the residuals are unequal. This can lead to difficulties in the interpretation of the raw residuals, yet it is still used. The formula for the raw residual is

$$\begin{equation*} r_{i}=y_{i}-\exp\{\textbf{X}_{i}\beta\}. \end{equation*}$$

#### Pearson Residual

The Pearson residual corrects for the unequal variance in the raw residuals by dividing by the standard deviation. The formula for the Pearson residuals is

$$\begin{equation*} p_{i}=\dfrac{r_{i}}{\sqrt{\hat{\phi}\exp\{\textbf{X}_{i}\beta\}}}, \end{equation*}$$

where

$$\begin{equation*} \hat{\phi}=\dfrac{1}{n-p}\sum_{i=1}^{n}\frac{(y_{i}-\exp\{\textbf{X}_{i}\hat{\beta}\})^{2}}{\exp\{\textbf{X}_{i}\hat{\beta}\}}. \end{equation*}$$

$$\hat{\phi}$$ is a dispersion parameter to help control overdispersion.

#### Deviance Residuals

Deviance residuals are also popular because the sum of squares of these residuals is the deviance statistic. The formula for the deviance residual is

$$\begin{equation*} d_{i}=\texttt{sgn}(y_{i}-\exp\{\textbf{X}_{i}\hat{\beta}\})\sqrt{2\biggl\{y_{i}\log\biggl(\dfrac{y_{i}}{\exp\{\textbf{X}_{i}\hat{\beta}\}}\biggr)-(y_{i}-\exp\{\textbf{X}_{i}\hat{\beta}\})\biggr\}}. \end{equation*}$$

The plots below show the Pearson residuals and deviance residuals versus the fitted values for the simulated example.

These plots appear to be good for a Poisson fit. Further diagnostic plots can also be produced and model selection techniques can be employed when faced with multiple predictors.

#### Hat Values

The hat matrix serves the same purpose as in the case of linear regression - to measure the influence of each observation on the overall fit of the model. The hat values, $$h_{i,i}$$, are the diagonal entries of the Hat matrix

$$\begin{equation*} H=\textbf{W}^{1/2}\textbf{X}(\textbf{X}\textbf{W}\textbf{X})^{-1}\textbf{X}\textbf{W}^{1/2}, \end{equation*}$$

where W is an $$n\times n$$ diagonal matrix with the values of $\exp\{\textbf{X}_{i}\hat{\beta}\}$ on the diagonal. As before, a hat value (leverage) is large if $$h_{i,i}>2p/n$$.

#### Studentized Residuals

Finally, we can also report Studentized versions of some of the earlier residuals. The Studentized Pearson residuals are given by

$$\begin{equation*} sp_{i}=\dfrac{p_{i}}{\sqrt{1-h_{i,i}}} \end{equation*}$$

and the Studentized deviance residuals are given by

$$\begin{equation*} sd_{i}=\dfrac{d_{i}}{\sqrt{1-h_{i, i}}}. \end{equation*}$$

To illustrate, the relevant Minitab output from the simulated example is:

##### Fits and Diagnostics for Unusual Observations
Obs y Fit SE Fit 95% CI Resid Std Resid Del Resid HI Cook's D
8 10.000 4.983 0.452 (4.171, 5.952) 1.974 2.02 2.03 0.040969 0.11
21 6.000 8.503 1.408 (6.147, 11.763) -0.907 -1.04 -1.02 0.233132 0.15
##### Fits and Diagnostics for Unusual Observation
Obs DFITS
8 0.474408 R
21 -0.540485   X

R large residual
X Unusual X

The default residuals in this output (set under Minitab's Regression Options) are deviance residuals, so observation 8 has deviance residual of 1.974 and studentized deviance residual of 2.02, while observation 21 has leverage (h) of 0.233132.

# 15.5 - Generalized Linear Models

15.5 - Generalized Linear Models

All of the regression models we have considered (including multiple linear, logistic, and Poisson) actually belong to a family of models called generalized linear models. (In fact, a more "generalized" framework for regression models is called  general regression models, which includes any parametric regression model.)  Generalized linear models provides a generalization of ordinary least squares regression that relates the random term (the response Y) to the systematic term (the linear predictor $$\textbf{X}\beta$$) via a link function (denoted by $$g(\cdot)$$). Specifically, we have the relation

$$\begin{equation*} \mbox{E}(Y)=\mu=g^{-1}(\textbf{X}\beta), \end{equation*}$$

so $$g(\mu)=\textbf{X}\beta$$. Some common link functions are:

$$\begin{equation*} g(\mu)=\mu=\textbf{X}\beta, \end{equation*}$$

which is used in traditional linear regression.

\begin{align*} &g(\mu)=\log\biggl(\frac{\mu}{1-\mu}\biggr)=\textbf{X}\beta\\ &\Rightarrow\mu=\frac{e^{\textbf{X}\beta}}{1+e^{\textbf{X}\beta}}, \end{align*}

which is used in logistic regression.

\begin{align*} &g(\mu)=\log(\mu)=\textbf{X}\beta\\ &\Rightarrow\mu=e^{\textbf{X}\beta}, \end{align*}

which is used in Poisson regression.

\begin{align*} &g(\mu)=\Phi^{-1}(\mu)=\textbf{X}\beta\\ &\Rightarrow\mu=\Phi(\textbf{X}\beta), \end{align*}

where $$\Phi(\cdot)$$ is the cumulative distribution function of the standard normal distribution. This link function is also sometimes called the normit link. This also can be used in logistic regression.

\begin{align*} &g(\mu)=\log(-\log(1-\mu))=\textbf{X}\beta\\ &\Rightarrow\mu=1-\exp\{-e^{\textbf{X}\beta}\}, \end{align*}

which can also be used in logistic regression. This link function is also sometimes called the gompit link.

\begin{align*} &g(\mu)=\mu^{\lambda}=\textbf{X}\beta\\ &\Rightarrow\mu=(\textbf{X}\beta)^{1/\lambda}, \end{align*}

where $$\lambda\neq 0$$. This is used in other regressions which we do not explore (such as gamma regression and inverse Gaussian regression).

Also, the variance is typically a function of the mean and is often written as

$$\begin{equation*} \mbox{Var}(Y)=V(\mu)=V(g^{-1}(\textbf{X}\beta)). \end{equation*}$$

The random variable Y is assumed to belong to an exponential family distribution where the density can be expressed in the form

$$\begin{equation*} q(y;\theta,\phi)=\exp\biggl\{\dfrac{y\theta-b(\theta)}{a(\phi)}+c(y,\phi)\biggr\}, \end{equation*}$$

where $$a(\cdot)$$, $$b(\cdot)$$, and $$c(\cdot)$$ are specified functions, $$\theta$$ is a parameter related to the mean of the distribution, and $$\phi$$ is called the dispersion parameter. Many probability distributions belong to the exponential family. For example, the normal distribution is used for traditional linear regression, the binomial distribution is used for logistic regression, and the Poisson distribution is used for Poisson regression. Other exponential family distributions lead to gamma regression, inverse Gaussian (normal) regression, and negative binomial regression, just to name a few.

The unknown parameters, $$\beta$$, are typically estimated with maximum likelihood techniques (in particular, using iteratively reweighted least squares), Bayesian methods, or quasi-likelihood methods. The quasi-likelihood is a function which possesses similar properties to the log-likelihood function and is most often used with count or binary data. Specifically, for a realization y of the random variable Y, it is defined as

$$\begin{equation*} Q(\mu;y)=\int_{y}^{\mu}\dfrac{y-t}{\sigma^{2}V(t)}dt, \end{equation*}$$

where $$\sigma^{2}$$ is a scale parameter. There are also tests using likelihood ratio statistics for model development to determine if any predictors may be dropped from the model.

# 15.6 - Nonlinear Regression

15.6 - Nonlinear Regression

All of the models we have discussed thus far have been linear in the parameters (i.e., linear in the beta's). For example, polynomial regression was used to model curvature in our data by using higher-ordered values of the predictors. However, the final regression model was just a linear combination of higher-ordered predictors.

Now we are interested in studying the nonlinear regression model:

$$\begin{equation*} Y=f(\textbf{X},\beta)+\epsilon, \end{equation*}$$

where X is a vector of p predictors, $$\beta$$ is a vector of k parameters, $$f(\cdot)$$ is some known regression function, and $$\epsilon$$ is an error term whose distribution may or may not be normal. Notice that we no longer necessarily have the dimension of the parameter vector simply one greater than the number of predictors. Some examples of nonlinear regression models are:

\begin{align*} y_{i}&=\frac{e^{\beta_{0}+\beta_{1}x_{i}}}{1+e^{\beta_{0}+\beta_{1}x_{i}}}+\epsilon_{i} \\ y_{i}&=\frac{\beta_{0}+\beta_{1}x_{i}}{1+\beta_{2}e^{\beta_{3}x_{i}}}+\epsilon_{i} \\ y_{i}&=\beta_{0}+(0.4-\beta_{0})e^{-\beta_{1}(x_{i}-5)}+\epsilon_{i}. \end{align*}

However, there are some nonlinear models which are actually called intrinsically linear because they can be made linear in the parameters by a simple transformation. For example:

$$\begin{equation*} Y=\frac{\beta_{0}X}{\beta_{1}+X} \end{equation*}$$

can be rewritten as

\begin{align*} \frac{1}{Y}&=\frac{1}{\beta_{0}}+\frac{\beta_{1}}{\beta_{0}}\frac{1}{X}\\ &=\theta_{0}+\theta_{1}\frac{1}{X}, \end{align*}

which is linear in the transformed parameters $$\theta_{0}$$ and $$\theta_{1}$$. In such cases, transforming a model to its linear form often provides better inference procedures and confidence intervals, but one must be cognizant of the effects that the transformation has on the distribution of the errors.

### Nonlinear Least Squares

Returning to cases in which it is not possible to transform the model to a linear form, consider the setting

$$\begin{equation*} Y_{i}=f(\textbf{X}_{i},\beta)+\epsilon_{i}, \end{equation*}$$

where the $$\epsilon_{i}$$ are iid normal with mean 0 and constant variance $$\sigma^{2}$$. For this setting, we can rely on some of the least squares theory we have developed over the course. For other nonnormal error terms, different techniques need to be employed.

First, let

$$\begin{equation*} Q=\sum_{i=1}^{n}(y_{i}-f(\textbf{X}_{i},\beta))^{2}. \end{equation*}$$

In order to find

$$\begin{equation*} \hat{\beta}=\arg\min_{\beta}Q, \end{equation*}$$

we first find each of the partial derivatives of Q with respect to $$\beta_{j}$$. Then, we set each of the partial derivatives equal to 0 and the parameters $$\beta_{k}$$ are each replaced by $$\hat{\beta}_{k}$$. The functions to be solved are nonlinear in the parameter estimates $$\hat{\beta}_{k}$$ and are often difficult to solve, even in the simplest cases. Hence, iterative numerical methods are often employed. Even more difficulty arises in that multiple solutions may be possible!

Algorithms for nonlinear least squares estimation include:

• Newton's method, a classical method based on a gradient approach but which can be computationally challenging and heavily dependent on good starting values.
• The Gauss-Newton algorithm, a modification of Newton's method that gives a good approximation of the solution that Newton's method should have arrived at but which is not guaranteed to converge.
• The Levenberg-Marquardt method, which can take care of computational difficulties arising with the other methods but can require a tedious search for the optimal value of a tuning parameter.

# 15.7 - Exponential Regression Example

15.7 - Exponential Regression Example

One simple nonlinear model is the exponential regression model

$$\begin{equation*} y_{i}=\beta_{0}+\beta_{1}\exp(\beta_{2}x_{i,1}+\ldots+\beta_{p+1}x_{i,1})+\epsilon_{i}, \end{equation*}$$

where the $$\epsilon_{i}$$ are iid normal with mean 0 and constant variance $$\sigma^{2}$$. Notice that if $$\beta_{0}=0$$, then the above is intrinsically linear by taking the natural logarithm of both sides.

Exponential regression is probably one of the simplest nonlinear regression models. An example where an exponential regression is often utilized is when relating the concentration of a substance (the response) to elapsed time (the predictor).

To illustrate, consider the example on long-term recovery after discharge from hospital from page 514 of Applied Linear Regression Models (4th ed) by Kutner, Nachtsheim, and Neter. The response variable, Y, is the prognostic index for long-term recovery and the predictor variable, X, is the number of days of hospitalization. The proposed model is the two-parameter exponential model:

$$\begin{equation*} Y_{i}=\theta_{0}\exp(\theta_{1}X_i)+\epsilon_{i}, \end{equation*}$$

where the $$\epsilon_i$$ are independent normal with constant variance.

We'll use Minitab's nonlinear regression routine to apply the Gauss-Newton algorithm to estimate $$\theta_0$$ and $$\theta_1$$. Before we do this, however, we have to find initial values for $$\theta_0$$ and $$\theta_1$$. One way to do this is to note that we can linearize the response function by taking the natural logarithm:

$$\begin{equation*} \log(\theta_{0}\exp(\theta_{1}X_i)) = \log(\theta_{0}) + \theta_{1}X_i. \end{equation*}$$

Thus we can fit a simple linear regression model with response, $$\log(Y)$$, and predictor, $$X$$, and the intercept ($$4.0372$$) gives us an estimate of $$\log(\theta_{0})$$ while the slope ($$-0.03797$$) gives us an estimate of $$\theta_{1}$$. (We then calculate $$\exp(4.0372)=56.7$$ to estimate $$\theta_0$$.)

#### Minitab: Nonlinear Regression Model

Now we can fit the nonlinear regression model:

1. Select Stat > Regression > Nonlinear Regression, select prog for the response, and click "Use Catalog" under "Expectation Function."
2. Select the "Exponential" function with 1 predictor and 2 parameters in the Catalog dialog box and click OK to go to the "Choose Predictors" dialog.
3. Select days to be the "Actual predictor" and click OK to go back to the Catalog dialog box, where you should see "Theta1 * exp( Theta2 * days )" in the "Expectation Function" box.
4. Click "Parameters" and type "56.7" next to "Theta1" and "-0.038" next to "Theta2" and click OK to go back to the Nonlinear Regression dialog box.
5. Click "Options" to confirm that Mintab will use the Gauss-Newton algorithm (the other choice is Levenberg-Marquardt) and click OK to go back to the Nonlinear Regression dialog box.
6. Click "Graphs" to confirm that Mintab will produce a plot of the fitted curve with data and click OK to go back to the Nonlinear Regression dialog box.
7. Click OK to obtain the following output:
##### Nonlinear Regression: Prog = Theta1 * exp(Theta2 * days)
Method
Algorithm Gauss-Newton
Max iterations 200
Tolerance 0.00001
Starting Values for Parameters
Parameter Value
Theta1 56.7
Theta2 -0.038
##### Equation

prog = 58.6066 * exp(-0.0395865 * days)

Parameter Estimates
Parameter Estimate SE Estimate
Theta1 58.6066 1.47216
Theta2 0.0396 0.00171

prog = Theta1 * exp(Theta2 * days)

 Iterations 5 Final SSE 49.4593 DFE 13 MSE 3.80456 S 1.95053

# 15.8 - Population Growth Example

15.8 - Population Growth Example

## Census Data

A simple model for population growth towards an asymptote is the logistic model

$$\begin{equation*} y_{i}=\dfrac{\beta_{1}}{1+\exp(\beta_{2}+\beta_{3}x_{i})}+\epsilon_{i}, \end{equation*}$$

where $$y_{i}$$ is the population size at time $$x_{i}$$, $$\beta_{1}$$ is the asymptote towards which the population grows, $$\beta_{2}$$ reflects the size of the population at time x = 0 (relative to its asymptotic size), and $$\beta_{3}$$ controls the growth rate of the population.

We fit this model to Census population data (US Census data) for the United States (in millions) ranging from 1790 through 1990 (see below).

 year population 1790 3.929 1800 5.308 1810 7.240 1820 9.638 1830 12.866 1840 17.069 1850 23.192 1860 31.443 1870 39.818 1880 50.156 1890 62.948 1900 75.995 1910 91.972 1920 105.711 1930 122.775 1940 131.669 1950 150.697 1960 179.323 1970 203.302 1980 226.542 1990 248.710

The data are graphed (see below) and the line represents the fit of the logistic population growth model.

To fit the logistic model to the U. S. Census data, we need starting values for the parameters. It is often important in nonlinear least squares estimation to choose reasonable starting values, which generally requires some insight into the structure of the model. We know that $$\beta_{1}$$ represents asymptotic population. The data in the plot above show that in 1990 the U. S. population stood at about 250 million and did not appear to be close to an asymptote; so as not to extrapolate too far beyond the data, let us set the starting value of $$\beta_{1}$$ to 350. It is convenient to scale time so that $$x_{1}=0$$ in 1790, and so that the unit of time is 10 years.

Then substituting $$\beta_{1}=350$$ and $$x=0$$ into the model, using the value $$y_{1}=3.929$$ from the data, and assuming that the error is 0, we have

$$\begin{equation*} 3.929=\dfrac{350}{1+\exp(\beta_{2}+\beta_{3}(0))}. \end{equation*}$$

Solving for $$\beta_{2}$$ gives us a plausible start value for this parameter:

\begin{align*} \exp(\beta_{2})&=\dfrac{350}{3.929}-1\\ \beta_{2}&=\log\biggl(\frac{350}{3.929}-1\biggr)\approx 4.5. \end{align*}

Finally, returning to the data, at time x = 1 (i.e., at the second Census performed in 1800), the population was $$y_{2}=5.308$$. Using this value, along with the previously determined start values for $$\beta_{1}$$ and $$\beta_{2}$$, and again setting the error to 0, we have

$$\begin{equation*} 5.308=\dfrac{350}{1+\exp(4.5+\beta_{3}(1))}. \end{equation*}$$

Solving for $$\beta_{3}$$ we get

\begin{align*} \exp(4.5+\beta_{3})&=\frac{350}{5.308}-1\\ \beta_{3}&=\log\biggl(\dfrac{350}{5.308}-1\biggr)-4.5\approx -0.3. \end{align*}

So now we have starting values for the nonlinear least squares algorithm that we use. Below is the output from fitting the model in Minitab using Gauss-Netwon:

• Select Stat > Regression > Nonlinear Regression
• Select "population" for the Response
• Type the following into the "Edit directly" box under Expectation Function: beta1/(1+exp(beta2+beta3*(year-1790)/10))
• Click Parameters and type in the values specified above (350, 4.5, and –0.3)

As you can see, the starting values resulted in convergence with values not too far from our guess.

##### Equation

population = 389.166/(1+exp(3.99035 - 0.22662 * (year - 1790) / 10))

##### Parameter Estimates
Paraemeter Estimate SE Estimate
beta1 389.166 30.8120
beta2 3.990 0.0703
beta3 -0.227 0.0109

population = beta1/(1 + exp(beta2 + beta3 * (year - 1790) / 10))

##### Lack of Fit

There are no replicates.

Minitab cannot do the lack of fit test based on pure errror.

##### Summary
 DFE Final SSE Iterations 8 356.400 18 19.8000 4.44972

Here is a plot of the residuals versus the year.

As you can see, the logistic functional form that we chose did catch the gross characteristics of this data, but some of the nuances appear to not be as well characterized. Since there are indications of some cyclical behavior, a model incorporating correlated errors or, perhaps, trigonometric functions could be investigated.

# Software Help 15

Software Help 15

The next two pages cover the Minitab and R commands for the procedures in this lesson.

Below is a zip file that contains all the data sets used in this lesson:

STAT501_Lesson15.zip

• us_census.txt
• leukemia_remission.txt
• poisson_simulated.txt
• toxicity.txt

# Minitab Help 15: Logistic, Poisson & Nonlinear Regression

Minitab Help 15: Logistic, Poisson & Nonlinear Regression

## Minitab®

### Leukemia remission (logistic regression)

• Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put REMISS in the "Response" box, and put CELL, SMEAR, INFIL, LI, BLAST, and TEMP in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
• Repeat but with just LI as a single predictor.
• Fit a logistic regression model of REMISS vs LI.
• Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of REMISS vs LI with a fitted line based on the logistic regression model.
• Before clicking "OK" in the Regression Dialog, click "Options" and type "10" into the box labeled "Number of groups for Hosmer-Lemeshow test." This will result in a new table in the output titled "Goodness-of-Fit Tests" with results for Deviance (not valid for this example), Pearson (also not valid for this example), and Hosmer-Lemeshow goodness-of-fit tests.
• Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Order" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."

### Disease outbreak (logistic regression)

• Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put Disease in the "Response" box, and put Age, Middle, Lower, and Sector in the "Continuous predictors" box. Before clicking "OK," click "Model," shift-select the four predictors in the top-left box, click "Add" next to "Interactions through order 2," but remove the "Middle*Lower" interaction from the "Terms in the model" box since it is meaningless.
• Repeat but with the five interactions removed.
• Repeat but with the five interactions included and before clicking "OK," click "Options" and select "Sequential (Type I)" for "Deviances for tests."

### Toxicity and insects (logistic regression using event/trial data format)

• Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, select "Response in event/trial format," put Deaths in the "Number of events" box, put SampSize in the "Number of trials" box, and put "Dose" in the "Continuous predictors" box. Change "Event name" to "Death" if you like (optional). Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
• Select Calc > Calculator to calculate observed probabilities as Deaths/SampSize.
• Before clicking "OK" in the Regression Dialog, click "Storage" and select "Fits (event probabilities)."
• Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of observed probabilities vs Dose with a fitted line based on the logistic regression model.

### Poisson example (Poisson regression)

• Select Graph > Scatterplot to create a scatterplot of the data.
• Select Stat > Regression > Poisson Regression > Fit Poisson Model, put y in the "Response" box and put x in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
• Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Fits" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."

### Hospital recovery (exponential regression)

• Select Calc > Calculator to create a log(prog) variable.
• Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
• Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put prog in the "Response" box and type "Theta1 * exp(Theta2 * days)" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and -0.038 as the starting value for Theta2.
• A scatterplot of prog vs days with a fitted line based on the nonlinear regression model is produced by default.

### U.S. census population (population growth nonlinear regression)

• Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
• Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put population in the "Response" box and type "beta1 / (1 + exp(beta2 + beta3 * (year - 1790) / 10))" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and -0.038 as the starting value for Theta2.
• A scatterplot of population vs year with a fitted line based on the nonlinear regression model is produced by default.
• Before clicking "OK" in the Regression Dialog, click "Graphs" and put year in the "Residuals versus the variables" box.

# R Help 15: Logistic, Poisson & Nonlinear Regression

R Help 15: Logistic, Poisson & Nonlinear Regression

## R

### Leukemia remission (logistic regression)

• Fit a logistic regression model of REMISS vs CELL + SMEAR + INFIL + LI + BLAST + TEMP.
• Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
• Fit a logistic regression model of REMISS vs LI.
• Create a sctterplot of REMISS vs LI and add a fitted line based on the logistic regression model.
• Calculate the odds ratio for LI and a 95% confidence interval.
• Conduct a likelihood ratio (or deviance) test for LI.
• Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals.
• Use the hoslem.test function in the ResourceSelection package to conduct the Hosmer-Lemeshow goodness-of-fit test.
• Calculate a version of $$R^2$$ for logistic regression.
• Create residual plots using Pearson and deviance residuals.
• Calculate hat values (leverages), studentized residuals, and Cook's distances.
        
attach(leukemia)

model.1 <- glm(REMISS ~ CELL + SMEAR + INFIL + LI + BLAST + TEMP, family="binomial")
summary(model.1)
#               Estimate Std. Error z value Pr(>|z|)
# (Intercept)   64.25808   74.96480   0.857    0.391
# CELL          30.83006   52.13520   0.591    0.554
# SMEAR         24.68632   61.52601   0.401    0.688
# INFIL        -24.97447   65.28088  -0.383    0.702
# LI             4.36045    2.65798   1.641    0.101
# BLAST         -0.01153    2.26634  -0.005    0.996
# TEMP        -100.17340   77.75289  -1.288    0.198
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 34.372  on 26  degrees of freedom
# Residual deviance: 21.594  on 20  degrees of freedom
# AIC: 35.594

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

model.2 <- glm(REMISS ~ LI, family="binomial")
summary(model.2)
#             Estimate Std. Error z value Pr(>|z|)
# (Intercept)   -3.777      1.379  -2.740  0.00615 **
# LI             2.897      1.187   2.441  0.01464 *
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 34.372  on 26  degrees of freedom
# Residual deviance: 26.073  on 25  degrees of freedom
# AIC: 30.073

plot(x=LI, y=REMISS,
panel.last = lines(sort(LI), fitted(model.2)[order(LI)]))

exp(coef(model.2)[2]) # odds ratio = 18.12449
exp(confint.default(model.2)[2,]) # 95% CI = (1.770284, 185.561725)

anova(model.2, test="Chisq")
#      Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL                    26     34.372
# LI    1   8.2988        25     26.073 0.003967 **

sum(residuals(model.2, type="deviance")^2) # 26.07296
model.2$deviance # 26.07296 sum(residuals(model.2, type="pearson")^2) # 23.93298 library(ResourceSelection) hoslem.test(model.2$y, fitted(model.2), g=9)
# Hosmer and Lemeshow goodness of fit (GOF) test
# data:  REMISS, fitted(model.2)
# X-squared = 7.3293, df = 7, p-value = 0.3954

1-model.2$deviance/model.2$null.deviance # "R-squared" = 0.2414424

plot(1:27, residuals(model.2, type="pearson"), type="b")
plot(1:27, residuals(model.2, type="deviance"), type="b")

summary(influence.measures(model.2))
#  dfb.1_ dfb.LI dffit   cov.r cook.d hat
# 8  0.63  -0.83  -0.93_*  0.88  0.58   0.15

hatvalues(model.2)[8] # 0.1498395
residuals(model.2)[8] # -1.944852
rstudent(model.2)[8] # -2.185013
cooks.distance(model.2)[8] # 0.5833219

detach(leukemia)


### Disease outbreak (logistic regression)

• Load the disease outbreak data.
• Create interaction variables.
• Fit "full" logistic regression model of Disease vs four predictors and five interactions.
• Fit "reduced" logistic regression model of Disease vs four predictors.
• Conduct a likelihood ratio (or deviance) test for the five interactions.
• Display the analysis of deviance table with sequential deviances.

attach(disease)

Age.Middle <- Age*Middle
Age.Lower <- Age*Lower
Age.Sector <- Age*Sector
Middle.Sector <- Middle*Sector
Lower.Sector <- Lower*Sector

model.1 <- glm(Disease ~ Age + Middle + Lower + Sector + Age.Middle + Age.Lower +
Age.Sector + Middle.Sector + Lower.Sector, family="binomial")
model.2 <- glm(Disease ~ Age + Middle + Lower + Sector, family="binomial")
anova(model.2, model.1, test="Chisq")
#   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1        93    101.054
# 2        88     93.996  5   7.0583   0.2163

anova(model.1, test="Chisq")
#               Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL                             97    122.318
# Age            1   7.4050        96    114.913 0.006504 **
# Middle         1   1.8040        95    113.109 0.179230
# Lower          1   1.6064        94    111.502 0.205003
# Sector         1  10.4481        93    101.054 0.001228 **
# Age.Middle     1   4.5697        92     96.484 0.032542 *
# Age.Lower      1   1.0152        91     95.469 0.313666
# Age.Sector     1   1.1202        90     94.349 0.289878
# Middle.Sector  1   0.0001        89     94.349 0.993427
# Lower.Sector   1   0.3531        88     93.996 0.552339

detach(disease)


### Toxicity and insects (logistic regression using event/trial data format)

• Create a Survivals variable and a matrix with Deaths in one column and Survivals in the other column.
• Fit a logistic regression model of Deaths vs Dose.
• Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
• Calculate the odds ratio for Dose and a 95% confidence interval.
• Display the observed and fitted probabilities.
• Create a sctterplot of observed probabilties vs Dose and add a fitted line based on the logistic regression model.

attach(toxicity)

Survivals <- SampSize - Deaths
y <- cbind(Deaths, Survivals)

model.1 <- glm(y ~ Dose, family="binomial")
summary(model.1)
#             Estimate Std. Error z value Pr(>|z|)
# (Intercept) -2.64367    0.15610  -16.93   <2e-16 ***
# Dose         0.67399    0.03911   17.23   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 383.0695  on 5  degrees of freedom
# Residual deviance:   1.4491  on 4  degrees of freedom
# AIC: 39.358

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

exp(coef(model.1)[2]) # odds ratio = 1.962056
exp(confint.default(model.1)[2,]) # 95% CI = (1.817279, 2.118366)

cbind(Dose, SampSize, Deaths, Deaths/SampSize, fitted(model.1))
#   Dose SampSize Deaths
# 1    1      250     28 0.112 0.1224230
# 2    2      250     53 0.212 0.2148914
# 3    3      250     93 0.372 0.3493957
# 4    4      250    126 0.504 0.5130710
# 5    5      250    172 0.688 0.6739903
# 6    6      250    197 0.788 0.8022286

plot(x=Dose, y=Deaths/SampSize,
panel.last = lines(sort(Dose), fitted(model.1)[order(Dose)]))

detach(toxicity)


### Poisson example (Poisson regression)

• Create a scatterplot of the data.
• Fit a Poisson regression model of y vs x.
• Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
• Create a sctterplot of y vs x and add a fitted line based on the Poisson regression model.
• Conduct a likelihood ratio (or deviance) test for x.
• Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals and calculate p-values based on chi-squared goodness-of-fit tests.
• Calculate pseudo $$R^2$$ for Poisson regression.
• Create residual plots using Pearson and deviance residuals.
• Calculate hat values (leverages) and studentized residuals.

attach(poisson)

plot(x=x, y=y)

model.1 <- glm(y ~ x, family="poisson")
summary(model.1)
#             Estimate Std. Error z value Pr(>|z|)
# (Intercept)  0.30787    0.28943   1.064    0.287
# x            0.07636    0.01730   4.413 1.02e-05 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for poisson family taken to be 1)
#
# Null deviance: 48.310  on 29  degrees of freedom
# Residual deviance: 27.842  on 28  degrees of freedom
# AIC: 124.5

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

plot(x=x, y=y,
panel.last = lines(sort(x), fitted(model.1)[order(x)]))

anova(model.1, test="Chisq")
#      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
# NULL                    29     48.310
# x     1   20.468        28     27.842 6.065e-06 ***

sum(residuals(model.1, type="deviance")^2) # 27.84209
model.1$deviance # 27.84209 pchisq(model.1$deviance, 28, lower.tail=F) # p-value = 0.4728389

sum(residuals(model.1, type="pearson")^2) # 26.09324
pchisq(sum(residuals(model.1, type="pearson")^2), 28, lower.tail=F) # p-value = 0.5679192

1-model.1$deviance/model.1$null.deviance # Pseudo R-squared = 0.423676

plot(fitted(model.1), residuals(model.1, type="pearson"))
plot(fitted(model.1), residuals(model.1, type="deviance"))

summary(influence.measures(model.1))
#    dfb.1_ dfb.x dffit cov.r   cook.d hat
# 10 -0.22   0.30  0.37  1.25_*  0.08   0.18
# 21  0.37  -0.48 -0.57  1.30_*  0.15   0.23_*

residuals(model.1)[8] # 1.974329
rstudent(model.1)[8] # 2.028255

detach(poisson)


### Hospital recovery (exponential regression)

• Create log(prog) variable.
• Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
• Fit nonlinear regression model to data using these starting values.
• Create a scatterplot of prog vs days and add a fitted line based on the nonlinear regression model.

attach(recovery)

logprog <- log(prog)
summary(lm(logprog ~ days))
#              Estimate Std. Error t value Pr(>|t|)
# (Intercept)  4.037159   0.084103   48.00 5.08e-16 ***
# days        -0.037974   0.002284  -16.62 3.86e-10 ***

exp(4.037159) # 56.66513

model.1 <- nls(prog ~ theta1 * exp(theta2 * days),
start=list(theta1=56.7, theta2=-0.038))
summary(model.1)
#         Estimate Std. Error t value Pr(>|t|)
# theta1 58.606532   1.472159   39.81 5.70e-15 ***
# theta2 -0.039586   0.001711  -23.13 6.01e-12 ***
# ---
# Residual standard error: 1.951 on 13 degrees of freedom

plot(x=days, y=prog,
panel.last = lines(sort(days), fitted(model.1)[order(days)]))

detach(recovery)


### U.S. census population (population growth nonlinear regression)

• Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
• Fit nonlinear regression model to data using these starting values.
• Create a scatterplot of population vs year and add a fitted line based on the nonlinear regression model.
• Create a residual plot.

attach(census)

plot(x=year, y=population)

log(350/3.929-1) # 4.478259
log(350/5.308-1) - log(350/3.929-1) # -0.3048229

model.1 <- nls(population ~ beta1 / (1 + exp(beta2 + beta3 * (year - 1790) / 10)),
start=list(beta1=350, beta2=4.5, beta3=-0.3))
summary(model.1)
#        Estimate Std. Error t value Pr(>|t|)
# beta1 389.16551   30.81196   12.63  2.2e-10 ***
# beta2   3.99035    0.07032   56.74  < 2e-16 ***
# beta3  -0.22662    0.01086  -20.87  4.6e-14 ***
# ---
# Residual standard error: 4.45 on 18 degrees of freedom

plot(x=year, y=population,
panel.last = lines(sort(year), fitted(model.1)[order(year)]))

plot(x=year, y=residuals(model.1),
panel.last = abline(h=0, lty=2))

detach(census)

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility