Lesson 15: Logistic, Poisson & Nonlinear Regression
Lesson 15: Logistic, Poisson & Nonlinear RegressionOverview
Multiple linear regression can be generalized to handle a response variable that is categorical or a count variable. This lesson covers the basics of such models, specifically logistic and Poisson regression, including model fitting and inference.
Multiple linear regression, logistic regression, and Poisson regression are examples of generalized linear models, which this lesson introduces briefly.
The lesson concludes with some examples of nonlinear regression, specifically exponential regression and population growth models.
Objectives
 Apply logistic regression techniques to datasets with a binary response variable.
 Apply Poisson regression techniques to datasets with a count response variable.
 Understand the basics of fitting and inference for nonlinear regression methods when the regression function acting on the predictors is not linear in the parameters.
Lesson 15 Code Files
Below is a zip file that contains all the data sets used in this lesson:
 us_census.txt
 leukemia_remission.txt
 poisson_simulated.txt
 toxicity.txt
15.1  Logistic Regression
15.1  Logistic RegressionLogistic regression models a relationship between predictor variables and a categorical response variable. For example, we could use logistic regression to model the relationship between various measurements of a manufactured specimen (such as dimensions and chemical composition) to predict if a crack greater than 10 mils will occur (a binary variable: either yes or no). Logistic regression helps us estimate a probability of falling into a certain level of the categorical response given a set of predictors. We can choose from three types of logistic regression, depending on the nature of the categorical response variable:
Binary Logistic Regression:
Used when the response is binary (i.e., it has two possible outcomes). The cracking example given above would utilize binary logistic regression. Other examples of binary responses could include passing or failing a test, responding yes or no on a survey, and having high or low blood pressure.
Nominal Logistic Regression:
Used when there are three or more categories with no natural ordering to the levels. Examples of nominal responses could include departments at a business (e.g., marketing, sales, HR), type of search engine used (e.g., Google, Yahoo!, MSN), and color (black, red, blue, orange).
Ordinal Logistic Regression:
Used when there are three or more categories with a natural ordering to the levels, but the ranking of the levels do not necessarily mean the intervals between them are equal. Examples of ordinal responses could be how students rate the effectiveness of a college course on a scale of 15, levels of flavors for hot wings, and medical condition (e.g., good, stable, serious, critical).
Particular issues with modelling a categorical response variable include nonnormal error terms, nonconstant error variance, and constraints on the response function (i.e., the response is bounded between 0 and 1). We will investigate ways of dealing with these in the binary logistic regression setting here. There is some discussion of the nominal and ordinal logistic regression settings in Section 15.2.
The multiple binary logistic regression model is the following:
\(\begin{align}\label{logmod}
\pi&=\dfrac{\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p1}X_{p1})}{1+\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p1}X_{p1})}\notag \\
&
=\dfrac{\exp(\textbf{X}\beta)}{1+\exp(\textbf{X}\beta)}\\
&
=\dfrac{1}{1+\exp(\textbf{X}\beta)}
\end{align}\)
where here \(\pi\) denotes a probability and not the irrational number 3.14....
 \(\pi\) is the probability that an observation is in a specified category of the binary Y variable, generally called the "success probability."
 Notice that the model describes the probability of an event happening as a function of X variables. For instance, it might provide estimates of the probability that an older person has heart disease.
 With the logistic model, estimates of \(\pi\) from equations like the one above will always be between 0 and 1. The reasons are:
 The numerator \(\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p1}X_{p1})\) must be positive, because it is a power of a positive value (e).
 The denominator of the model is (1 + numerator), so the answer will always be less than 1.
 With one X variable, the theoretical model for \(\pi\) has an elongated "S" shape (or sigmoidal shape) with asymptotes at 0 and 1, although in sample estimates we may not see this "S" shape if the range of the X variable is limited.
For a sample of size n, the likelihood for a binary logistic regression is given by:
\(\begin{align*}
L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\pi_{i}^{y_{i}}(1\pi_{i})^{1y_{i}}\\
&
=\prod_{i=1}^{n}\biggl(\dfrac{\exp(\textbf{X}_{i}\beta)}{1+\exp(\textbf{X}_{i}\beta)}\biggr)^{y_{i}}\biggl(\dfrac{1}{1+\exp(\textbf{X}_{i}\beta)}\biggr)^{1y_{i}}.
\end{align*}\)
This yields the log likelihood:
\(\begin{align*}
\ell(\beta)&=\sum_{i=1}^{n}(y_{i}\log(\pi_{i})+(1y_{i})\log(1\pi_{i}))\\
&
=\sum_{i=1}^{n}(y_{i}\textbf{X}_{i}\beta\log(1+\exp(\textbf{X}_{i}\beta))).
\end{align*}\)
Maximizing the likelihood (or log likelihood) has no closedform solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, \(\hat{\beta}\).
To illustrate, consider data published on n = 27 leukemia patients. The Leukemia Remission data set has a response variable of whether leukemia remission occurred (REMISS), which is given by a 1.
The predictor variables are cellularity of the marrow clot section (CELL), smear differential percentage of blasts (SMEAR), percentage of absolute marrow leukemia cell infiltrate (INFIL), percentage labeling index of the bone marrow leukemia cells (LI), absolute number of blasts in the peripheral blood (BLAST), and the highest temperature prior to start of treatment (TEMP).
The following gives the estimated logistic regression equation and associated significance tests from Minitab:
 Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model.
 Select "REMISS" for the Response (the response event for remission is 1 for this data).
 Select all the predictors as Continuous predictors.
 Click Options and choose Deviance or Pearson residuals for diagnostic plots.
 Click Graphs and select "Residuals versus order."
 Click Results and change "Display of results" to "Expanded tables."
 Click Storage and select "Coefficients."
This results in the following output:
Coefficients
Term  Coef  SE Coef  95% CI  ZValue  PValue  VIF 

Constant  64.3  75.0  ( 82.7, 211.2)  0.86  0.391  
CELL  30.8  52.1  ( 71.4, 133.0)  0.59  0.554  62.46 
SMEAR  24.7  61.5  ( 95.9, 145.3)  0.40  0.688  434.42 
INFILL  25.0  65.3  (  152.9, 103.0)  0.38  0.702  471.10 
LI  4.36  2.66  ( 0.85, 9.57)  1.64  0.101  4.43 
Blast  0.01  2.27  ( 4.45, 4.43)  0.01  0.996  4.18 
TEMP  100.2  77.8  (252.6, 52.2)  1.29  0.198  3.01 
Wald Test
The Wald test is the test of significance for individual regression coefficients in logistic regression (recall that we use ttests in linear regression). For maximum likelihood estimates, the ratio
\(\begin{equation*}
Z=\dfrac{\hat{\beta}_{i}}{\textrm{s.e.}(\hat{\beta}_{i})}
\end{equation*}\)
can be used to test \(H_{0} \colon \beta_{i}=0\). The standard normal curve is used to determine the \(p\)value of the test. Furthermore, confidence intervals can be constructed as
\(\begin{equation*}
\hat{\beta}_{i}\pm z_{1\alpha/2}\textrm{s.e.}(\hat{\beta}_{i}).
\end{equation*}\)
Estimates of the regression coefficients, \(\hat{\beta}\), are given in the Minitab output Coefficients table in the column labeled "Coef." This table also gives coefficient pvalues based on Wald tests. The index of the bone marrow leukemia cells (LI) has the smallest pvalue and so appears to be closest to a significant predictor of remission occurring. After looking at various subsets of the data, we find that a good model is one which only includes the labeling index as a predictor:
Coefficients
Term  Coef  SE Coef  95% CI  ZValue  PValue  VIF 

Constant  3.78  1.38  ( 6.48, 1.08)  2.74  0.006  
LI  2.90  1.19  ( 0.57, 5.22)  2.44  0.015  1.00 
Regression Equation
p(1) = exp(Y')/(1 + exp(Y'))
Y' = 3.78 + 2.90 LI
Since we only have a single predictor in this model we can create a Binary Fitted Line Plot to visualize the sigmoidal shape of the fitted logistic regression curve:
Odds, Log Odds, and Odds Ratio
There are algebraically equivalent ways to write the logistic regression model:
The first is
\(\begin{equation}\label{logmod1}
\dfrac{\pi}{1\pi}=\exp(\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p1}X_{p1}),
\end{equation}\)
which is an equation that describes the odds of being in the current category of interest. By definition, the odds for an event is \(\pi\) / (1  π) such that π is the probability of the event. For example, if you are at the racetrack and there is a 80% chance that a certain horse will win the race, then his odds are 0.80 / (1  0.80) = 4, or 4:1.
The second is
\(\begin{equation}\label{logmod2}
\log\biggl(\dfrac{\pi}{1\pi}\biggr)=\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p1}X_{p1},
\end{equation}\)
which states that the (natural) logarithm of the odds is a linear function of the X variables (and is often called the log odds). This is also referred to as the logit transformation of the probability of success, \(\pi\).
The odds ratio (which we will write as \(\theta\)) between the odds for two sets of predictors (say \(\textbf{X}_{(1)}\) and \(\textbf{X}_{(2)}\)) is given by
\(\begin{equation*}
\theta=\dfrac{(\pi/(1\pi))_{\textbf{X}=\textbf{X}_{(1)}}}{(\pi/(1\pi))_{\textbf{X}=\textbf{X}_{(2)}}}.
\end{equation*}\)
For binary logistic regression, the odds of success are:
\(\begin{equation*}
\dfrac{\pi}{1\pi}=\exp(\textbf{X}\beta).
\end{equation*}\)
By plugging this into the formula for \(\theta\) above and setting \(\textbf{X}_{(1)}\) equal to \(\textbf{X}_{(2)}\) except in one position (i.e., only one predictor differs by one unit), we can determine the relationship between that predictor and the response. The odds ratio can be any nonnegative number. An odds ratio of 1 serves as the baseline for comparison and indicates there is no association between the response and predictor. If the odds ratio is greater than 1, then the odds of success are higher for higher levels of a continuous predictor (or for the indicated level of a factor). In particular, the odds increase multiplicatively by \(\exp(\beta_{j})\) for every oneunit increase in \(\textbf{X}_{j}\). If the odds ratio is less than 1, then the odds of success are less for higher levels of a continuous predictor (or for the indicated level of a factor). Values farther from 1 represent stronger degrees of association.
For example, when there is just a single predictor, \(X\), the odds of success are:
\(\begin{equation*}
\dfrac{\pi}{1\pi}=\exp(\beta_0+\beta_1X).
\end{equation*}\)
If we increase \(X\) by one unit, the odds ratio is
\(\begin{equation*}
\theta=\dfrac{\exp(\beta_0+\beta_1(X+1))}{\exp(\beta_0+\beta_1X)}=\exp(\beta_1).
\end{equation*}\)
To illustrate, the relevant Minitab output from the leukemia example is:
Odd Ratios for Continuous Predictors
Odds Ratio  95% CI  

LI  18.1245  (1.7703, 185.5617) 
The odds ratio for LI of 18.1245 is calculated as \(\exp(2.89726)\) (you can view more decimal places for the coefficient estimates in Minitab by clicking "Storage" in the Regression Dialog and selecting "Coefficients"). The 95% confidence interval is calculated as \(\exp(2.89726\pm z_{0.975}*1.19)\), where \(z_{0.975}=1.960\) is the \(97.5^{\textrm{th}}\) percentile from the standard normal distribution. The interpretation of the odds ratio is that for every increase of 1 unit in LI, the estimated odds of leukemia remission are multiplied by 18.1245. However, since the LI appears to fall between 0 and 2, it may make more sense to say that for every .1 unit increase in L1, the estimated odds of remission are multiplied by \(\exp(2.89726\times 0.1)=1.336\). Then
 At LI=0.9, the estimated odds of leukemia remission is \(\exp\{3.77714+2.89726*0.9\}=0.310\).
 At LI=0.8, the estimated odds of leukemia remission is \(\exp\{3.77714+2.89726*0.8\}=0.232\).
 The resulting odds ratio is \(\dfrac{0.310}{0.232}=1.336\), which is the ratio of the odds of remission when LI=0.9 compared to the odds when L1=0.8.
Notice that \(1.336\times 0.232=0.310\), which demonstrates the multiplicative effect by \(\exp(0.1\hat{\beta_{1}})\) on the odds.
Likelihood Ratio (or Deviance) Test
The likelihood ratio test is used to test the null hypothesis that any subset of the \(\beta\)'s is equal to 0. The number of \(\beta\)'s in the full model is p, while the number of \(\beta\)'s in the reduced model is r. (Remember the reduced model is the model that results when the \(\beta\)'s in the null hypothesis are set to 0.) Thus, the number of \(\beta\)'s being tested in the null hypothesis is \(pr\). Then the likelihood ratio test statistic is given by:
\(\begin{equation*}
\Lambda^{*}=2(\ell(\hat{\beta}^{(0)})\ell(\hat{\beta})),
\end{equation*}\)
where \(\ell(\hat{\beta})\) is the log likelihood of the fitted (full) model and \(\ell(\hat{\beta}^{(0)})\) is the log likelihood of the (reduced) model specified by the null hypothesis evaluated at the maximum likelihood estimate of that reduced model. This test statistic has a \(\chi^{2}\) distribution with \(pr\) degrees of freedom. Minitab presents results for this test in terms of "deviance," which is defined as \(2\) times loglikelihood. The notation used for the test statistic is typically \(G^2\) = deviance (reduced) – deviance (full).
This test procedure is analogous to the general linear F test procedure for multiple linear regression. However, note that when testing a single coefficient, the Wald test and likelihood ratio test will not, in general, give identical results.
To illustrate, the relevant software output from the leukemia example is:
Deviance Table
Source  DF  Adj Dev  Adj Mean  ChiSquare  PValue 

Regression  1  8.299  8.299  8.30  0.004 
LI  1  8.299  8.299  8.30  0.004 
Error  25  26.073  1.043  
Total  26  34.372 
Since there is only a single predictor for this example, this table simply provides information on the likelihood ratio test for LI (pvalue of 0.004), which is similar but not identical to the earlier Wald test result (pvalue of 0.015). The Deviance Table includes the following:
 The null (reduced) model, in this case, has no predictors, so the fitted probabilities are simply the sample proportion of successes, \(9/27=0.333333\). The loglikelihood for the null model is \(\ell(\hat{\beta}^{(0)})=17.1859\), so the deviance for the null model is \(2\times17.1859=34.372\), which is shown in the "Total" row in the Deviance Table.
 The loglikelihood for the fitted (full) model is \(\ell(\hat{\beta})=13.0365\), so the deviance for the fitted model is \(2\times13.0365=26.073\), which is shown in the "Error" row in the Deviance Table.
 The likelihood ratio test statistic is therefore \(\Lambda^{*}=2(17.1859(13.0365))=8.299\), which is the same as \(G^2=34.37226.073=8.299\).
 The pvalue comes from a \(\chi^{2}\) distribution with \(21=1\) degrees of freedom.
When using the likelihood ratio (or deviance) test for more than one regression coefficient, we can first fit the "full" model to find deviance (full), which is shown in the "Error" row in the resulting full model Deviance Table. Then fit the "reduced" model (corresponding to the model that results if the null hypothesis is true) to find deviance (reduced), which is shown in the "Error" row in the resulting reduced model Deviance Table. For example, the relevant Deviance Tables for the Disease Outbreak example on pages 581582 of Applied Linear Regression Models (4th ed) by Kutner et al are:
Full model
Source  DF  Adj Dev  Adj Mean  ChiSquare  PValue 

Regression  9  28.322  3.14686  28.32  0.001 
Error  88  93.996  1.06813  
Total  97  122.318 
Reduced Model
Source  DF  Adj Dev  Adj Mean  ChiSquare  PValue 

Regression  4  21.263  5.3159  21.26  0.000 
Error  93  101.054  1.06813  
Total  97  122.318 
Here the full model includes four singlefactor predictor terms and five twofactor interaction terms, while the reduced model excludes the interaction terms. The test statistic for testing the interaction terms is \(G^2 = 101.05493.996 = 7.058\), which is compared to a chisquare distribution with \(105=5\) degrees of freedom to find the pvalue > 0.05 (meaning the interaction terms are not significant).
Alternatively, select the corresponding predictor terms last in the full model and request Minitab to output Sequential (Type I) Deviances. Then add the corresponding Sequential Deviances in the resulting Deviance Table to calculate \(G^2\). For example, the relevant Deviance Table for the Disease Outbreak example is:
Source  DF  Seq Dev  Seq Mean  ChiSquare  PValue 

Regression  9  28.322  3.1469  28.32  0.001 
Age  1  7.405  7.4050  7.40  0.007 
Middle  1  1.804  1.8040  1.80  0.179 
Lower  1  1.606  1.6064  1.61  0.205 
Sector  1  10.448  10.4481  10.45  0.001 
Age*Middle  1  4.570  4.5397  4.57  0.033 
Age*Lower  1  1.015  1.0152  1.02  0.314 
Age*Sector  1  1.120  1.1202  1.12  0.290 
Middle*Sector  1  0.000  0.0001  0.00  0.993 
Lower*Sector  1  0.353  0.3531  0.35  0.3552 
Error  88  93.996  1.0681  
Total  97  122.318 
The test statistic for testing the interaction terms is \(G^2 = 4.570+1.015+1.120+0.000+0.353 = 7.058\), the same as in the first calculation.
GoodnessofFit Tests
Overall performance of the fitted model can be measured by several different goodnessoffit tests. Two tests that require replicated data (multiple observations with the same values for all the predictors) are the Pearson chisquare goodnessoffit test and the deviance goodnessoffit test(analagous to the multiple linear regression lackoffit Ftest). Both of these tests have statistics that are approximately chisquare distributed with c  p degrees of freedom, where c is the number of distinct combinations of the predictor variables. When a test is rejected, there is a statistically significant lack of fit. Otherwise, there is no evidence of lack of fit.
By contrast, the HosmerLemeshow goodnessoffit test is useful for unreplicated datasets or for datasets that contain just a few replicated observations. For this test the observations are grouped based on their estimated probabilities. The resulting test statistic is approximately chisquare distributed with c  2 degrees of freedom, where c is the number of groups (generally chosen to be between 5 and 10, depending on the sample size).
To illustrate, the relevant Minitab output from the leukemia example is:
GoodnessofFit Test
Test  DF  Chisquare  PValue 

Deviance  25  26.07  0.404 
Pearson  25  23.93  0.523 
HosmerLemeshow  7  6.87  0.442 
Since there is no replicated data for this example, the deviance and Pearson goodnessoffit tests are invalid, so the first two rows of this table should be ignored. However, the HosmerLemeshow test does not require replicated data so we can interpret its high pvalue as indicating no evidence of lackoffit.
\(R^{2}\)
The calculation of \(R^{2}\) used in linear regression does not extend directly to logistic regression. One version of \(R^{2}\) used in logistic regression is defined as
\(\begin{equation*}
R^{2}=\dfrac{\ell(\hat{\beta_{0}})\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})\ell_{S}(\beta)},
\end{equation*}\)
where \(\ell(\hat{\beta_{0}})\) is the log likelihood of the model when only the intercept is included and \(\ell_{S}(\beta)\) is the log likelihood of the saturated model (i.e., where a model is fit perfectly to the data). This \(R^{2}\) does go from 0 to 1 with 1 being a perfect fit. With unreplicated data, \(\ell_{S}(\beta)=0\), so the formula simplifies to:
\(\begin{equation*}
R^{2}=\dfrac{\ell(\hat{\beta_{0}})\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})}=1\dfrac{\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})}.
\end{equation*}\)
To illustrate, the relevant Minitab output from the leukemia example is:
Model Summary
Deviance RSq 
Deviance RSq(adj) 
AIC 

24.14%  21.23%  30.07 
Recall from above that \(\ell(\hat{\beta})=13.0365\) and \(\ell(\hat{\beta}^{(0)})=17.1859\), so:
\(\begin{equation*}
R^{2}=1\dfrac{13.0365}{17.1859}=0.2414.
\end{equation*}\)
Note that we can obtain the same result by simply using deviances instead of loglikelihoods since the \(2\) factor cancels out:
\(\begin{equation*}
R^{2}=1\dfrac{26.073}{34.372}=0.2414.
\end{equation*}\)
Raw Residual
The raw residual is the difference between the actual response and the estimated probability from the model. The formula for the raw residual is
\(\begin{equation*}
r_{i}=y_{i}\hat{\pi}_{i}.
\end{equation*}\)
Pearson Residual
The Pearson residual corrects for the unequal variance in the raw residuals by dividing by the standard deviation. The formula for the Pearson residuals is
\(\begin{equation*}
p_{i}=\dfrac{r_{i}}{\sqrt{\hat{\pi}_{i}(1\hat{\pi}_{i})}}.
\end{equation*}\)
Deviance Residuals
Deviance residuals are also popular because the sum of squares of these residuals is the deviance statistic. The formula for the deviance residual is
\(\begin{equation*}
d_{i}=\pm\sqrt{2\biggl(y_{i}\log\biggl(\dfrac{y_{i}}{\hat{\pi}_{i}}\biggr)+(1y_{i})\log\biggl(\dfrac{1y_{i}}{1\hat{\pi}_{i}}\biggr)\biggr)}.
\end{equation*}\)
Here are the plots of the Pearson residuals and deviance residuals for the leukemia example. There are no alarming patterns in these plots to suggest a major problem with the model.
Hat Values
The hat matrix serves a similar purpose as in the case of linear regression – to measure the influence of each observation on the overall fit of the model – but the interpretation is not as clear due to its more complicated form. The hat values (leverages) are given by
\(\begin{equation*}
h_{i,i}=\hat{\pi}_{i}(1\hat{\pi}_{i})\textbf{x}_{i}^{\textrm{T}}(\textbf{X}^{\textrm{T}}\textbf{W}\textbf{X})\textbf{x}_{i},
\end{equation*}\)
where W is an \(n\times n\) diagonal matrix with the values of \(\hat{\pi}_{i}(1\hat{\pi}_{i})\) for \(i=1 ,\ldots,n\) on the diagonal. As before, we should investigate any observations with \(h_{i,i}>3p/n\) or, failing this, any observations with \(h_{i,i}>2p/n\) and very isolated.
Studentized Residuals
We can also report Studentized versions of some of the earlier residuals. The Studentized Pearson residuals are given by
\(\begin{equation*}
sp_{i}=\dfrac{p_{i}}{\sqrt{1h_{i,i}}}
\end{equation*}\)
and the Studentized deviance residuals are given by
\(\begin{equation*}
sd_{i}=\dfrac{d_{i}}{\sqrt{1h_{i, i}}}.
\end{equation*}\)
C and \(\bar{\textrm{C}}\)
C and \(\bar{\textrm{C}}\) are extensions of Cook's distance for logistic regression. C measures the overall change in fitted logits due to deleting the \(i^{\textrm{th}}\) observation for all points including the one deleted, while \(\bar{\textrm{C}}\) excludes the deleted point. They are defined by:
\(\begin{equation*}
\textrm{C}_{i}=\dfrac{p_{i}^{2}h _{i,i}}{p(1h_{i,i})^{2}}
\end{equation*}\)
and
\(\begin{equation*}
\bar{\textrm{C}}_{i}=\dfrac{p_{i}^{2}h_{i,i}}{p(1h_{i,i})}.
\end{equation*}\)
To illustrate, the relevant Minitab output from the leukemia example is:
Fits and Diagnostics for Unusual Observations
Obs  Probability  Fit  SE Fit  95% CI  Resid  Std Resid  Del Resid  HI 

8  0.000  0.849  0.139  (0.403, 0.979)  1.945  2.11  2.19  0.149840 
Fits and Diagnostics for Unusual Observations
Obs  Cook's D  DFITS  

8  0.58  1.08011  R 
R Large residual
The default residuals in this output (set under Minitab's Regression Options) are deviance residuals, so observation 8 has a deviance residual of \(1.945\), a studentized deviance residual of \(2.11\), a leverage (h) of \(0.149840\), and a Cook's distance (C) of 0.58.
DFDEV and DFCHI
DFDEV and DFCHI are statistics that measure the change in deviance and in Pearson's chisquare, respectively, that occurs when an observation is deleted from the data set. Large values of these statistics indicate observations that have not been fitted well. The formulas for these statistics are
\(\begin{equation*}
\textrm{DFDEV}_{i}=d_{i}^{2}+\bar{\textrm{C}}_{i}
\end{equation*}\)
and
\(\begin{equation*}
\textrm{DFCHI}_{i}=\dfrac{\bar{\textrm{C}}_{i}}{h_{i,i}}
\end{equation*}\)
15.2  Polytomous Regression
15.2  Polytomous RegressionIn binary logistic regression, we only had two possible outcomes. For polytomous logistic regression, we will consider the possibility of having k > 2 possible outcomes. (Note: The word polychotomous is sometimes used, but note that this is not actually a word!)
Nominal Logistic Regression
The multiple nominal logistic regression model (sometimes called the multinomial logistic regression model) is given by the following:
\(\begin{equation}\label{nommod}
\pi_{j}=\left\{
\begin{array}{ll}
\dfrac{\exp(\textbf{X}\beta_{j})}{1+\sum_{j=2}^{k}\exp(\textbf{X}\beta_{j})} & \hbox{j=2,...,k} \\
\\
\dfrac{1}{1+\sum_{j=2}^{k}\exp(\textbf{X}\beta_{j})}
& \hbox{j=1} \end{array} \right.
\end{equation}\)
where again \(\pi_{j}\) denotes a probability and not the irrational number. Notice that k  1 of the groups have their own set of \(\beta\) values. Furthermore, since \(\sum_{j=1}^{k}\pi_{j}=1\), we set the \(\beta\) values for group 1 to be 0 (this is what we call the reference group). Notice that when k = 2, we are back to binary logistic regression.
\(\pi_{j}\) is the probability that an observation is in one of k categories. The likelihood for the nominal logistic regression model is given by:
\(\begin{align*}
L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\prod_{j=1}^{k}\pi_{i,j}^{y_{i,j}}(1\pi_{i,j})^{1y_{i,j}},
\end{align*}\)
where the subscript \((i,j)\) means the \(i^{\textrm{th}}\) observation belongs to the \(j^{\textrm{th}}\) group. This yields the log likelihood:
\(\begin{equation*}
\ell(\beta)=\sum_{i=1}^{n}\sum_{j=1}^{k}y_{i,j}\pi_{i,j}.
\end{equation*}\)
Maximizing the likelihood (or log likelihood) has no closedform solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, \(\hat{\beta}\).
An odds ratio (\(\theta\)) of 1 serves as the baseline for comparison. If \(\theta=1\), then there is no association between the response and predictor. If \(\theta>1 \), then the odds of success are higher for the indicated level of the factor (or for higher levels of a continuous predictor). If \(\theta<1 \), then the odds of success are less for the indicated level of the factor (or for higher levels of a continuous predictor). Values farther from 1 represent stronger degrees of association. For nominal logistic regression, the odds of success (at two different levels of the predictors, say \(\textbf{X}_{(1)}\) and \(\textbf{X}_{(2)}\)) are:
\(\begin{equation*}
\theta=\dfrac{(\pi_{j}/\pi_{1})_{\textbf{X}=\textbf{X}_{(1)}}}{(\pi_{j}/\pi_{1})_{\textbf{X}=\textbf{X}_{(2)}}}.
\end{equation*}\)
Many of the procedures discussed in binary logistic regression can be extended to nominal logistic regression with the appropriate modifications.
Ordinal Logistic Regression
For ordinal logistic regression, we again consider k possible outcomes as in nominal logistic regression, except that the order matters. The multiple ordinal logistic regression model is the following:
\(\begin{equation}\label{ordmod}
\sum_{j=1}^{k^{*}}\pi_{j}=\dfrac{\exp(\beta_{0,k^{*}}+\textbf{X}\beta)}{1+\exp(\beta_{0,k^{*}}+\textbf{X}\beta)}
\end{equation}\)
such that \(k^{*}\leq k\), \(\pi_{1}\leq\pi_{2},\leq \ldots,\leq\pi_{k}\), and again \(\pi_{j}\) denotes a probability. Notice that this model is a cumulative sum of probabilities which involves just changing the intercept of the linear regression portion (so \(\beta\) is now (p  1)dimensional and X is \(n\times(p1)\) such that first column of this matrix is not a column of 1's). Also, it still holds that \(\sum_{j=1}^{k}\pi_{j}=1\).
\(\pi_{j}\) is still the probability that an observation is in one of k categories, but we are constrained by the model written in the equation above. The likelihood for the ordinal logistic regression model is given by:
\(\begin{align*}
L(\beta;\textbf{y},\textbf{X})&=\prod_{i=1}^{n}\prod_{j=1}^{k}\pi_{i,j}^{y_{i,j}}(1\pi_{i,j})^{1y_{i,j}},
\end{align*}\)
where the subscript (i, j) means the \(i^{\textrm{th}}\) observation belongs to the \(j^{\textrm{th}}\) group. This yields the log likelihood:
\(\begin{equation*}
\ell(\beta)=\sum_{i=1}^{n}\sum_{j=1}^{k}y_{i,j}\pi_{i,j}.
\end{equation*}\)
Notice that this is identical to the nominal logistic regression likelihood. Thus, maximization again has no closedform solution, so we defer to a procedure like iteratively reweighted least squares.
For ordinal logistic regression, a proportional odds model is used to determine the odds ratio. Again, an odds ratio (\(\theta\)) of 1 serves as the baseline for comparison between the two predictor levels, say \(\textbf{X}_{(1)}\) and \(\textbf{X}_{(2)}\). Only one parameter and one odds ratio is calculated for each predictor. Suppose we are interested in calculating the odds of \(\textbf{X}_{(1)}\) to \(\textbf{X}_{(2)}\). If \(\theta=1\), then there is no association between the response and these two predictors. If \(\theta>1\), then the odds of success are higher for the predictor \(\textbf{X}_{(1)}\). If \(\theta<1\), then the odds of success are less for the predictor \(\textbf{X}_{(1)}\). Values farther from 1 represent stronger degrees of association. For ordinal logistic regression, the odds ratio utilizes cumulative probabilities and their complements and is given by:
\(\begin{equation*}
\theta=\dfrac{\sum_{j=1}^{k^{*}}\pi_{j}_{\textbf{X}=\textbf{X}_{(1)}}/(1\sum_{j=1}^{k^{*}}\pi_{j})_{\textbf{X}=\textbf{X}_{(1)}}}{\sum_{j=1}^{k^{*}}\pi_{j}_{\textbf{X}=\textbf{X}_{(2)}}/(1\sum_{j=1}^{k^{*}}\pi_{j})_{\textbf{X}=\textbf{X}_{(2)}}}.
\end{equation*}\)
15.3  Further Logistic Regression Examples
15.3  Further Logistic Regression ExamplesExample 151: STAT 200 Dataset
Students in STAT 200 at Penn State were asked if they have ever driven after drinking (dataset unfortunately no longer available). They also were asked, “How many days per month do you drink at least two beers?” In the following discussion, \(\pi\) = the probability a student says “yes” they have driven after drinking. This is modeled using X = days per month of drinking two beers. Results from Minitab were as follows.
Variable  Value  Count 

DrivDrnk  Yes  122 (Event) 
No  127  
Total  249 
95% CI  

Coef  SE Coef  Z  P  Odds Ratio  Lower  Upper  
Constant  1.5514  0.2661  5.83  0.000  
DaysBeer  0.19031  0.02946  6.46  0.000  1.21  1.14  1.28 
Some things to note from the results are:
 We see that in the sample 122/249 students said they have driven after drinking. (Yikes!)
 Parameter estimates, given under Coef are \(\hat{\beta}_0\) = −1.5514, and \(\hat{\beta}_1\) = 0.19031.
 The model for estimating \(\pi\) = the probability of ever having driven after drinking is
\(\hat{\pi}=\dfrac{\exp(1.5514+0.19031X)}{1+\exp(1.5514+0.19031X)}\)
 The variable X = DaysBeer is a statistically significant predictor (Z = 6.46, P = 0.000).
We can also obtain a plot of the estimated probability of ever having driven under the influence (\(\pi\)) versus days per month of drinking at least two beers.
The vertical axis shows the probability of ever having driven after drinking. For example, if X = 4 days per month of drinking beer, then the estimated probability is calculated as:
\(\hat{\pi}=\dfrac{\exp(1.5514+0.19031(4))}{1+\exp(1.5514+0.19031(4))}=\frac{\exp(0.79016)}{1+\exp(0.79016)}=0.312\)
A few of these estimated probabilities are given in the following table:
DaysBeer  4  12  20  28 

\(\hat{\pi}\)  0.312  0.675  0.905  0.97 
In the results given above, we see that the estimate of the odds ratio is 1.21 for DaysBeer. This is given under Odds Ratio in the table of coefficients, standard errors and so on. The sample odds ratio was calculated as \(e^{0.19031}\). The interpretation of the odds ratio is that for each increase of one day of drinking beer per month, the predicted odds of having ever driven after drinking are multiplied by 1.21.
Above we found that at X = 4, the predicted probability of ever driving after drinking is \(\hat{\pi}\) = 0.312. Thus when X = 4, the predicted odds of ever driving after drinking is 0.312/(1 − 0.312) = 0.453. To find the odds when X = 5, one method would be to multiply the odds at X = 4 by the sample odds ratio. The calculation is 1.21 × 0.453 = 0.549. (Another method is to just do the calculation as we did for X = 4.)
Notice also, that the results give a 95% confidence interval estimate of the odd ratio (1.14 to 1.28).
We now include Gender (male or female) as an xvariable (along with DaysBeer). Some Minitab results are given below. Under Gender, the row for male is explaining that the program created an indicator variable with a value of 1 if the student is male and a value of 0 if the student is female.
95% CI  

Coef  SE Coef  Z  P  Odds Ratio  Lower  Upper  
Constant  1.7736  0.2945  6.02  0.000  
DaysBeer  0.18693  0.03004  6.22  0.000  1.21  1.14  1.28 
Gender male 
0.6172  0.2954  2.09  0.037  1.85  1.04  3.31 
Some things to note from the results are:
 The pvalues are less than 0.05 for both DaysBeer and Gender. This is evidence that both xvariables are useful for predicting the probability of ever having driven after drinking.
 For DaysBeer, the odds ratio is still estimated to equal 1.21 to two decimal places (calculated as \(e^{0.18693}\)).
 For Gender, the odds ratio is 1.85 (calculated as \(e^{0.6172}\)). For males, the odds of ever having driven after drinking is 1.85 times the odds for females, assuming DaysBeer is held constant.
Finally, the results for testing with respect to the multiple logistic regression model are as follows:
LogLikelihood = 139.981
Test that all slopes are zero: G = 65.125, DF = 2, PValue = 0.000
Notice that since we have a pvalue of 0.000 for this chisquare test, we therefore reject the null hypothesis that all of the slopes are equal to 0.
Example 152: Toxicity Dataset
An experiment is done to test the effect of a toxic substance on insects. The data originate from the textbook, Applied Linear Statistical Models by Kutner, Nachtsheim, Neter, & Li.
At each of six dose levels, 250 insects are exposed to the substance and the number of insects that die is counted (Toxicity data). We can use Minitab to calculate the observed probabilities as the number of observed deaths out of 250 for each dose level.
A binary logistic regression model is used to describe the connection between the observed probabilities of death as a function of dose level. Since the data is in event/trial format the procedure in Minitab is a little different to before:
 Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model
 Select "Response is in event/trial format"
 Select "Deaths" for Number of events, "SampSize" for Number of trials (and type "Death" for Event name if you like)
 Select Dose as a Continuous predictor
 Click Results and change "Display of results" to "Expanded tables"
 Click Storage and select "Fits (event probabilities)"
The Minitab output is as follows:
Coefficients
Term  Coef  SE Coef  95% CI  Z  P  VIF 

Constant  2.644  0.156  (2.950, 2.338)  16.94  0.000  
Dose  0.6740  0.0391  (0.5973, 0.7506)  17.23  0.000  1.00 
Odds Ratios for Continuous Predictors
Term  Odds Ratio  95% CI 

Dose  1.9621  (1.8173, 2.1184) 
Thus
\(\hat{\pi}=\dfrac{\exp(2.644+0.674X)}{1+\exp(2.644+0.674X)}\)
where X = Dose and \(\hat{\pi}\) is the estimated probability the insect dies (based on the model).
Predicted probabilities of death (based on the logistic model) for the six dose levels are given below (FITS). These probabilities closely agree with the observed values (Observed p) reported.
Data Display
Row  Dose  SampSize  Deaths  Observed P  FITS 

1  1  250  28  0.112  0.122423 
2  2  250  53  0.212  0.214891 
3  3  250  93  0.372  0.349396 
4  4  250  126  0.504  0.513071 
5  5  250  172  0.688  0.673990 
6  6  250  197  0.788  0.802229 
As an example of calculating the estimated probabilities, for Dose 1, we have
\(\hat{\pi}=\dfrac{\exp(2.644+0.674(1))}{1+\exp(2.644+0.674(1))}=0.1224\)
The odds ratio for Dose is 1.9621, the value under Odds Ratio in the output. It was calculated as \(e^{0.674}\). The interpretation of the odds ratio is that for every increase of 1 unit in dose level, the estimated odds of insect death are multiplied by 1.9621.
As an example of odds and odds ratio:
 At Dose = 1, the estimated odds of death is \(\hat{\pi}/(1− \hat{\pi})\) = 0.1224/(1−0.1224) = 0.1395.
 At Dose = 2, the estimated odds of death is \(\hat{\pi}/(1− \hat{\pi})\) = 0.2149/(1−0.2149) = 0.2737.
 The Odds Ratio = \(\dfrac{0.2737}{0.1395}\), which is the ratio of the odds of death when Dose = 2 compared to the odds when Dose = 1.
A property of the binary logistic regression model is that the odds ratio is the same for any increase of one unit in X, regardless of the specific values of X.
15.4  Poisson Regression
15.4  Poisson RegressionThe Poisson distribution for a random variable Y has the following probability mass function for a given value Y = y:
\(\begin{equation*}
\mbox{P}(Y=y\lambda)=\dfrac{e^{\lambda}\lambda^{y}}{y!},
\end{equation*}\)
for \(y=0,1,2,\ldots\). Notice that the Poisson distribution is characterized by the single parameter \(\lambda\), which is the mean rate of occurrence for the event being measured. For the Poisson distribution, it is assumed that large counts (with respect to the value of \(\lambda\)) are rare.
In Poisson regression the dependent variable (Y) is an observed count that follows the Poisson distribution. The rate \(\lambda\) is determined by a set of \(p1\) predictors \(\textbf{X}=(X_{1},\ldots,X_{p1})\). The expression relating these quantities is
\(\begin{equation*}
\lambda=\exp\{\textbf{X}\beta\}.
\end{equation*}\)
Thus, the fundamental Poisson regression model for observation i is given by
\(\begin{equation*}
\mbox{P}(Y_{i}=y_{i}\textbf{X}_{i},\beta)=\dfrac{e^{\exp\{\textbf{X}_{i}\beta\}}\exp\{\textbf{X}_{i}\beta\}^{y_{i}}}{y_{i}!}.
\end{equation*}\)
That is, for a given set of predictors, the categorical outcome follows a Poisson distribution with rate $\exp\{\textbf{X}\beta\}$. For a sample of size n, the likelihood for a Poisson regression is given by:
\(\begin{equation*}
L(\beta;\textbf{y},\textbf{X})=\prod_{i=1}^{n}\dfrac{e^{\exp\{\textbf{X}_{i}\beta\}}\exp\{\textbf{X}_{i}\beta\}^{y_{i}}}{y_{i}!}.
\end{equation*}\)
This yields the log likelihood:
\(\begin{equation*}
\ell(\beta)=\sum_{i=1}^{n}y_{i}\textbf{X}_{i}\beta\sum_{i=1}^{n}\exp\{\textbf{X}_{i}\beta\}\sum_{i=1}^{n}\log(y_{i}!).
\end{equation*}\)
Maximizing the likelihood (or log likelihood) has no closedform solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, \(\hat{\beta}\). Once this value of \(\hat{\beta}\) has been obtained, we may proceed to define various goodnessoffit measures and calculated residuals. For the residuals we present, they serve the same purpose as in linear regression. When plotted versus the response, they will help identify suspect data points.
To illustrate consider this example (Poisson Simulated data), which consists of a simulated data set of size n = 30 such that the response (Y) follows a Poisson distribution with rate $\lambda=\exp\{0.50+0.07X\}$. A plot of the response versus the predictor is given below.
The following gives the analysis of the Poisson regression data in Minitab:
 Select Stat > Regression > Poisson Regression > Fit Poisson Model.
 Select "y" for the Response.
 Select "x" as a Continuous predictor.
 Click Results and change "Display of results" to "Expanded tables."
This results in the following output:
Coefficients
Term  Coef  SE Coef  95% CI  ZValue  PValue  VIF 

Constant  0.308  0.289  (0.259, 0.875)  1.06  0.287  
x  0.0764  0.0173  (0.0424, 0.1103)  4.41  0.000  1.00 
Regression Equation
y = exp(Y') Y' = 0.308 + 0.0764 x 
As you can see, the Wald test pvalue for x of 0.000 indicates that the predictor is highly significant.
Deviance Test
Changes in the deviance can be used to test the null hypothesis that any subset of the \(\beta\)'s is equal to 0. The deviance, \(D(\hat{\beta})\), is \(2\) times the difference between the loglikelihood evaluated at the maximum likelihood estimate and the loglikelihood for a "saturated model" (a theoretical model with a separate parameter for each observation and thus a perfect fit). Suppose we test that r < p of the $\beta$'s are equal to 0. Then the deviance test statistic is given by:
\(\begin{equation*}
G^2=D(\hat{\beta}^{(0)})D(\hat{\beta}),
\end{equation*}\)
where \(D(\hat{\beta})\) is the deviance of the fitted (full) model and \(D(\hat{\beta}^{(0)})\) is the deviance of the model specified by the null hypothesis evaluated at the maximum likelihood estimate of that reduced model. This test statistic has a \(\chi^{2}\) distribution with \(pr\) degrees of freedom. This test procedure is analagous to the general linear F test procedure for multiple linear regression.
To illustrate, the relevant Minitab output from the simulated example is:
Deviance Table
Source  DF  Seq Dev  Contribution  Adj Dev  Adj Mean  ChiSquare  PValue 

Regression  1  20.47  42.37%  20.47  20.4677  20.47  0.000 
x  1  20.47  42.37%  20.47  20.4677  20.47  0.000 
Error  28  27.84  57.63%  27.84  0.9944  
Total  29  48.31  100.00% 
Since there is only a single predictor for this example, this table simply provides information on the deviance test for x (pvalue of 0.000), which matches the earlier Wald test result (pvalue of 0.000). (Note that the Wald test and deviance test will not in general give identical results.) The Deviance Table includes the following:
 The null model in this case has no predictors, so the fitted values are simply the sample mean, \(4.233\). The deviance for the null model is \(D(\hat{\beta}^{(0)})=48.31\), which is shown in the "Total" row in the Deviance Table.
 The deviance for the fitted model is \(D(\hat{\beta})=27.84\), which is shown in the "Error" row in the Deviance Table.
 The deviance test statistic is therefore \(G^2=48.3127.84=20.47\).
 The pvalue comes from a $\chi^{2}$ distribution with \(21=1\) degrees of freedom.
GoodnessofFit
Overall performance of the fitted model can be measured by two different chisquare tests. There is the Pearson statistic
\(\begin{equation*}
X^2=\sum_{i=1}^{n}\dfrac{(y_{i}\exp\{\textbf{X}_{i}\hat{\beta}\})^{2}}{\exp\{\textbf{X}_{i}\hat{\beta}\}}
\end{equation*}\)
and the deviance statistic
\(\begin{equation*}
D=2\sum_{i=1}^{n}\biggl(y_{i}\log\biggl(\dfrac{y_{i}}{\exp\{\textbf{X}_{i}\hat{\beta}\}}\biggr)(y_{i}\exp\{\textbf{X}_{i}\hat{\beta}\})\biggr).
\end{equation*}\)
Both of these statistics are approximately chisquare distributed with n  p degrees of freedom. When a test is rejected, there is a statistically significant lack of fit. Otherwise, there is no evidence of lackoffit.
To illustrate, the relevant Minitab output from the simulated example is:
GoodnessofFit
Test  DF  Estimate  Mean  ChiSquare  PValue 

Deviance  28  27.84209  0.099436  27.84  0.473 
Pearson  28  26.09324  0.93190  26.09  0.568 
The high pvalues indicate no evidence of lackoffit.
Overdispersion means that the actual covariance matrix for the observed data exceeds that for the specified model for \(Y\textbf{X}\). For a Poisson distribution, the mean and the variance are equal. In practice, the data almost never reflects this fact and we have overdispersion in the Poisson regression model if (as is often the case) the variance is greater than the mean. In addition to testing goodnessoffit, the Pearson statistic can also be used as a test of overdispersion. Note that overdispersion can also be measured in the logistic regression models that were discussed earlier.
Pseudo \(R^{2}\)
The value of \(R^{2}\) used in linear regression also does not extend to Poisson regression. One commonly used measure is the pseudo \(R^{2}\), defined as
\(\begin{equation*}
R^{2}=\dfrac{\ell(\hat{\beta_{0}})\ell(\hat{\beta})}{\ell(\hat{\beta_{0}})}=1\dfrac{D(\hat{\beta})}{D(\hat{\beta_{0}})},
\end{equation*}\)
where \(\ell(\hat{\beta_{0}})\) is the log likelihood of the model when only the intercept is included. The pseudo \(R^{2}\) goes from 0 to 1 with 1 being a perfect fit.
To illustrate, the relevant Minitab output from the simulated example is:
Model Summary
Deviance RSq 
Deviance RSq(adj) 
AIC 

42.37%  40.30%  124.50 
Recall from above that \(D(\hat{\beta})=27.84\) and \(D(\hat{\beta}^{(0)})=48.31\), so:
\(\begin{equation*}
R^{2}=1\dfrac{27.84}{48.31}=0.4237.
\end{equation*}\)
Raw Residual
The raw residual is the difference between the actual response and the estimated value from the model. Remember that the variance is equal to the mean for a Poisson random variable. Therefore, we expect that the variances of the residuals are unequal. This can lead to difficulties in the interpretation of the raw residuals, yet it is still used. The formula for the raw residual is
\(\begin{equation*}
r_{i}=y_{i}\exp\{\textbf{X}_{i}\beta\}.
\end{equation*}\)
Pearson Residual
The Pearson residual corrects for the unequal variance in the raw residuals by dividing by the standard deviation. The formula for the Pearson residuals is
\(\begin{equation*}
p_{i}=\dfrac{r_{i}}{\sqrt{\hat{\phi}\exp\{\textbf{X}_{i}\beta\}}},
\end{equation*}\)
where
\(\begin{equation*}
\hat{\phi}=\dfrac{1}{np}\sum_{i=1}^{n}\frac{(y_{i}\exp\{\textbf{X}_{i}\hat{\beta}\})^{2}}{\exp\{\textbf{X}_{i}\hat{\beta}\}}.
\end{equation*}\)
\(\hat{\phi}\) is a dispersion parameter to help control overdispersion.
Deviance Residuals
Deviance residuals are also popular because the sum of squares of these residuals is the deviance statistic. The formula for the deviance residual is
\(\begin{equation*}
d_{i}=\texttt{sgn}(y_{i}\exp\{\textbf{X}_{i}\hat{\beta}\})\sqrt{2\biggl\{y_{i}\log\biggl(\dfrac{y_{i}}{\exp\{\textbf{X}_{i}\hat{\beta}\}}\biggr)(y_{i}\exp\{\textbf{X}_{i}\hat{\beta}\})\biggr\}}.
\end{equation*}\)
The plots below show the Pearson residuals and deviance residuals versus the fitted values for the simulated example.
These plots appear to be good for a Poisson fit. Further diagnostic plots can also be produced and model selection techniques can be employed when faced with multiple predictors.
Hat Values
The hat matrix serves the same purpose as in the case of linear regression  to measure the influence of each observation on the overall fit of the model. The hat values, \(h_{i,i}\), are the diagonal entries of the Hat matrix
\(\begin{equation*}
H=\textbf{W}^{1/2}\textbf{X}(\textbf{X}\textbf{W}\textbf{X})^{1}\textbf{X}\textbf{W}^{1/2},
\end{equation*}\)
where W is an \(n\times n\) diagonal matrix with the values of $\exp\{\textbf{X}_{i}\hat{\beta}\}$ on the diagonal. As before, a hat value (leverage) is large if \(h_{i,i}>2p/n\).
Studentized Residuals
Finally, we can also report Studentized versions of some of the earlier residuals. The Studentized Pearson residuals are given by
\(\begin{equation*}
sp_{i}=\dfrac{p_{i}}{\sqrt{1h_{i,i}}}
\end{equation*}\)
and the Studentized deviance residuals are given by
\(\begin{equation*}
sd_{i}=\dfrac{d_{i}}{\sqrt{1h_{i, i}}}.
\end{equation*}\)
To illustrate, the relevant Minitab output from the simulated example is:
Fits and Diagnostics for Unusual Observations
Obs  y  Fit  SE Fit  95% CI  Resid  Std Resid  Del Resid  HI  Cook's D 

8  10.000  4.983  0.452  (4.171, 5.952)  1.974  2.02  2.03  0.040969  0.11 
21  6.000  8.503  1.408  (6.147, 11.763)  0.907  1.04  1.02  0.233132  0.15 
Fits and Diagnostics for Unusual Observation
Obs  DFITS  

8  0.474408  R  
21  0.540485  X 
R large residual
X Unusual X
The default residuals in this output (set under Minitab's Regression Options) are deviance residuals, so observation 8 has deviance residual of 1.974 and studentized deviance residual of 2.02, while observation 21 has leverage (h) of 0.233132.
15.5  Generalized Linear Models
15.5  Generalized Linear ModelsAll of the regression models we have considered (including multiple linear, logistic, and Poisson) actually belong to a family of models called generalized linear models. (In fact, a more "generalized" framework for regression models is called general regression models, which includes any parametric regression model.) Generalized linear models provides a generalization of ordinary least squares regression that relates the random term (the response Y) to the systematic term (the linear predictor \(\textbf{X}\beta\)) via a link function (denoted by \(g(\cdot)\)). Specifically, we have the relation
\(\begin{equation*}
\mbox{E}(Y)=\mu=g^{1}(\textbf{X}\beta),
\end{equation*}\)
so \(g(\mu)=\textbf{X}\beta\). Some common link functions are:
 Identity Link

The identity link:
\(\begin{equation*}
g(\mu)=\mu=\textbf{X}\beta,
\end{equation*}\)which is used in traditional linear regression.
 Logit Link

The logit link:
\(\begin{align*}
&g(\mu)=\log\biggl(\frac{\mu}{1\mu}\biggr)=\textbf{X}\beta\\
&\Rightarrow\mu=\frac{e^{\textbf{X}\beta}}{1+e^{\textbf{X}\beta}},
\end{align*}\)which is used in logistic regression.
 Log Link

The log link:
\(\begin{align*}
&g(\mu)=\log(\mu)=\textbf{X}\beta\\
&\Rightarrow\mu=e^{\textbf{X}\beta},
\end{align*}\)which is used in Poisson regression.
 Probit Link

The probit link:
\(\begin{align*}
&g(\mu)=\Phi^{1}(\mu)=\textbf{X}\beta\\
&\Rightarrow\mu=\Phi(\textbf{X}\beta),
\end{align*}\)where \(\Phi(\cdot)\) is the cumulative distribution function of the standard normal distribution. This link function is also sometimes called the normit link. This also can be used in logistic regression.
 Complementary LogLog Link

The complementary loglog link:
\(\begin{align*}
&g(\mu)=\log(\log(1\mu))=\textbf{X}\beta\\
&\Rightarrow\mu=1\exp\{e^{\textbf{X}\beta}\},
\end{align*}\)which can also be used in logistic regression. This link function is also sometimes called the gompit link.
 Power Link

The power link:
\(\begin{align*}
&g(\mu)=\mu^{\lambda}=\textbf{X}\beta\\
&\Rightarrow\mu=(\textbf{X}\beta)^{1/\lambda},
\end{align*}\)where \(\lambda\neq 0\). This is used in other regressions which we do not explore (such as gamma regression and inverse Gaussian regression).
Also, the variance is typically a function of the mean and is often written as
\(\begin{equation*}
\mbox{Var}(Y)=V(\mu)=V(g^{1}(\textbf{X}\beta)).
\end{equation*}\)
The random variable Y is assumed to belong to an exponential family distribution where the density can be expressed in the form
\(\begin{equation*}
q(y;\theta,\phi)=\exp\biggl\{\dfrac{y\thetab(\theta)}{a(\phi)}+c(y,\phi)\biggr\},
\end{equation*}\)
where \(a(\cdot)\), \(b(\cdot)\), and \(c(\cdot)\) are specified functions, \(\theta\) is a parameter related to the mean of the distribution, and \(\phi\) is called the dispersion parameter. Many probability distributions belong to the exponential family. For example, the normal distribution is used for traditional linear regression, the binomial distribution is used for logistic regression, and the Poisson distribution is used for Poisson regression. Other exponential family distributions lead to gamma regression, inverse Gaussian (normal) regression, and negative binomial regression, just to name a few.
The unknown parameters, \(\beta\), are typically estimated with maximum likelihood techniques (in particular, using iteratively reweighted least squares), Bayesian methods, or quasilikelihood methods. The quasilikelihood is a function which possesses similar properties to the loglikelihood function and is most often used with count or binary data. Specifically, for a realization y of the random variable Y, it is defined as
\(\begin{equation*}
Q(\mu;y)=\int_{y}^{\mu}\dfrac{yt}{\sigma^{2}V(t)}dt,
\end{equation*}\)
where \(\sigma^{2}\) is a scale parameter. There are also tests using likelihood ratio statistics for model development to determine if any predictors may be dropped from the model.
15.6  Nonlinear Regression
15.6  Nonlinear RegressionAll of the models we have discussed thus far have been linear in the parameters (i.e., linear in the beta's). For example, polynomial regression was used to model curvature in our data by using higherordered values of the predictors. However, the final regression model was just a linear combination of higherordered predictors.
Now we are interested in studying the nonlinear regression model:
\(\begin{equation*}
Y=f(\textbf{X},\beta)+\epsilon,
\end{equation*}\)
where X is a vector of p predictors, \(\beta\) is a vector of k parameters, \(f(\cdot)\) is some known regression function, and \(\epsilon\) is an error term whose distribution may or may not be normal. Notice that we no longer necessarily have the dimension of the parameter vector simply one greater than the number of predictors. Some examples of nonlinear regression models are:
\(\begin{align*}
y_{i}&=\frac{e^{\beta_{0}+\beta_{1}x_{i}}}{1+e^{\beta_{0}+\beta_{1}x_{i}}}+\epsilon_{i} \\
y_{i}&=\frac{\beta_{0}+\beta_{1}x_{i}}{1+\beta_{2}e^{\beta_{3}x_{i}}}+\epsilon_{i} \\
y_{i}&=\beta_{0}+(0.4\beta_{0})e^{\beta_{1}(x_{i}5)}+\epsilon_{i}.
\end{align*}\)
However, there are some nonlinear models which are actually called intrinsically linear because they can be made linear in the parameters by a simple transformation. For example:
\(\begin{equation*}
Y=\frac{\beta_{0}X}{\beta_{1}+X}
\end{equation*}\)
can be rewritten as
\(\begin{align*}
\frac{1}{Y}&=\frac{1}{\beta_{0}}+\frac{\beta_{1}}{\beta_{0}}\frac{1}{X}\\
&=\theta_{0}+\theta_{1}\frac{1}{X},
\end{align*}\)
which is linear in the transformed parameters \(\theta_{0}\) and \(\theta_{1}\). In such cases, transforming a model to its linear form often provides better inference procedures and confidence intervals, but one must be cognizant of the effects that the transformation has on the distribution of the errors.
Nonlinear Least Squares
Returning to cases in which it is not possible to transform the model to a linear form, consider the setting
\(\begin{equation*}
Y_{i}=f(\textbf{X}_{i},\beta)+\epsilon_{i},
\end{equation*}\)
where the \(\epsilon_{i}\) are iid normal with mean 0 and constant variance \(\sigma^{2}\). For this setting, we can rely on some of the least squares theory we have developed over the course. For other nonnormal error terms, different techniques need to be employed.
First, let
\(\begin{equation*}
Q=\sum_{i=1}^{n}(y_{i}f(\textbf{X}_{i},\beta))^{2}.
\end{equation*}\)
In order to find
\(\begin{equation*}
\hat{\beta}=\arg\min_{\beta}Q,
\end{equation*}\)
we first find each of the partial derivatives of Q with respect to \(\beta_{j}\). Then, we set each of the partial derivatives equal to 0 and the parameters \(\beta_{k}\) are each replaced by \(\hat{\beta}_{k}\). The functions to be solved are nonlinear in the parameter estimates \(\hat{\beta}_{k}\) and are often difficult to solve, even in the simplest cases. Hence, iterative numerical methods are often employed. Even more difficulty arises in that multiple solutions may be possible!
Algorithms for nonlinear least squares estimation include:
 Newton's method, a classical method based on a gradient approach but which can be computationally challenging and heavily dependent on good starting values.
 The GaussNewton algorithm, a modification of Newton's method that gives a good approximation of the solution that Newton's method should have arrived at but which is not guaranteed to converge.
 The LevenbergMarquardt method, which can take care of computational difficulties arising with the other methods but can require a tedious search for the optimal value of a tuning parameter.
15.7  Exponential Regression Example
15.7  Exponential Regression ExampleOne simple nonlinear model is the exponential regression model
\(\begin{equation*}
y_{i}=\beta_{0}+\beta_{1}\exp(\beta_{2}x_{i,1}+\ldots+\beta_{p+1}x_{i,1})+\epsilon_{i},
\end{equation*}\)
where the \(\epsilon_{i}\) are iid normal with mean 0 and constant variance \(\sigma^{2}\). Notice that if \(\beta_{0}=0\), then the above is intrinsically linear by taking the natural logarithm of both sides.
Exponential regression is probably one of the simplest nonlinear regression models. An example where an exponential regression is often utilized is when relating the concentration of a substance (the response) to elapsed time (the predictor).
To illustrate, consider the example on longterm recovery after discharge from hospital from page 514 of Applied Linear Regression Models (4th ed) by Kutner, Nachtsheim, and Neter. The response variable, Y, is the prognostic index for longterm recovery and the predictor variable, X, is the number of days of hospitalization. The proposed model is the twoparameter exponential model:
\(\begin{equation*}
Y_{i}=\theta_{0}\exp(\theta_{1}X_i)+\epsilon_{i},
\end{equation*}\)
where the \(\epsilon_i\) are independent normal with constant variance.
We'll use Minitab's nonlinear regression routine to apply the GaussNewton algorithm to estimate \(\theta_0\) and \(\theta_1\). Before we do this, however, we have to find initial values for \(\theta_0\) and \(\theta_1\). One way to do this is to note that we can linearize the response function by taking the natural logarithm:
\(\begin{equation*}
\log(\theta_{0}\exp(\theta_{1}X_i)) = \log(\theta_{0}) + \theta_{1}X_i.
\end{equation*}\)
Thus we can fit a simple linear regression model with response, \(\log(Y)\), and predictor, \(X\), and the intercept (\(4.0372\)) gives us an estimate of \(\log(\theta_{0})\) while the slope (\(0.03797\)) gives us an estimate of \(\theta_{1}\). (We then calculate \(\exp(4.0372)=56.7\) to estimate \(\theta_0\).)
Minitab: Nonlinear Regression Model
Now we can fit the nonlinear regression model:
 Select Stat > Regression > Nonlinear Regression, select prog for the response, and click "Use Catalog" under "Expectation Function."
 Select the "Exponential" function with 1 predictor and 2 parameters in the Catalog dialog box and click OK to go to the "Choose Predictors" dialog.
 Select days to be the "Actual predictor" and click OK to go back to the Catalog dialog box, where you should see "Theta1 * exp( Theta2 * days )" in the "Expectation Function" box.
 Click "Parameters" and type "56.7" next to "Theta1" and "0.038" next to "Theta2" and click OK to go back to the Nonlinear Regression dialog box.
 Click "Options" to confirm that Mintab will use the GaussNewton algorithm (the other choice is LevenbergMarquardt) and click OK to go back to the Nonlinear Regression dialog box.
 Click "Graphs" to confirm that Mintab will produce a plot of the fitted curve with data and click OK to go back to the Nonlinear Regression dialog box.
 Click OK to obtain the following output:
Nonlinear Regression: Prog = Theta1 * exp(Theta2 * days)
Algorithm  GaussNewton 

Max iterations  200 
Tolerance  0.00001 
Parameter  Value 

Theta1  56.7 
Theta2  0.038 
Equation
prog = 58.6066 * exp(0.0395865 * days)
Parameter  Estimate  SE Estimate 

Theta1  58.6066  1.47216 
Theta2  0.0396  0.00171 
prog = Theta1 * exp(Theta2 * days)
Iterations  5 
Final SSE  49.4593 
DFE  13 
MSE  3.80456 
S  1.95053 
15.8  Population Growth Example
15.8  Population Growth ExampleCensus Data
A simple model for population growth towards an asymptote is the logistic model
\(\begin{equation*}
y_{i}=\dfrac{\beta_{1}}{1+\exp(\beta_{2}+\beta_{3}x_{i})}+\epsilon_{i},
\end{equation*}\)
where \(y_{i}\) is the population size at time \(x_{i}\), \(\beta_{1}\) is the asymptote towards which the population grows, \(\beta_{2}\) reflects the size of the population at time x = 0 (relative to its asymptotic size), and \(\beta_{3}\) controls the growth rate of the population.
We fit this model to Census population data (US Census data) for the United States (in millions) ranging from 1790 through 1990 (see below).
year  population 
1790  3.929 
1800  5.308 
1810  7.240 
1820  9.638 
1830  12.866 
1840  17.069 
1850  23.192 
1860  31.443 
1870  39.818 
1880  50.156 
1890  62.948 
1900  75.995 
1910  91.972 
1920  105.711 
1930  122.775 
1940  131.669 
1950  150.697 
1960  179.323 
1970  203.302 
1980  226.542 
1990  248.710 
The data are graphed (see below) and the line represents the fit of the logistic population growth model.
To fit the logistic model to the U. S. Census data, we need starting values for the parameters. It is often important in nonlinear least squares estimation to choose reasonable starting values, which generally requires some insight into the structure of the model. We know that \(\beta_{1}\) represents asymptotic population. The data in the plot above show that in 1990 the U. S. population stood at about 250 million and did not appear to be close to an asymptote; so as not to extrapolate too far beyond the data, let us set the starting value of \(\beta_{1}\) to 350. It is convenient to scale time so that \(x_{1}=0\) in 1790, and so that the unit of time is 10 years.
Then substituting \(\beta_{1}=350\) and \(x=0\) into the model, using the value \(y_{1}=3.929\) from the data, and assuming that the error is 0, we have
\(\begin{equation*}
3.929=\dfrac{350}{1+\exp(\beta_{2}+\beta_{3}(0))}.
\end{equation*}\)
Solving for \(\beta_{2}\) gives us a plausible start value for this parameter:
\(\begin{align*}
\exp(\beta_{2})&=\dfrac{350}{3.929}1\\
\beta_{2}&=\log\biggl(\frac{350}{3.929}1\biggr)\approx 4.5.
\end{align*}\)
Finally, returning to the data, at time x = 1 (i.e., at the second Census performed in 1800), the population was \(y_{2}=5.308\). Using this value, along with the previously determined start values for \(\beta_{1}\) and \(\beta_{2}\), and again setting the error to 0, we have
\(\begin{equation*}
5.308=\dfrac{350}{1+\exp(4.5+\beta_{3}(1))}.
\end{equation*}\)
Solving for \(\beta_{3}\) we get
\(\begin{align*}
\exp(4.5+\beta_{3})&=\frac{350}{5.308}1\\
\beta_{3}&=\log\biggl(\dfrac{350}{5.308}1\biggr)4.5\approx 0.3.
\end{align*}\)
So now we have starting values for the nonlinear least squares algorithm that we use. Below is the output from fitting the model in Minitab using GaussNetwon:
 Select Stat > Regression > Nonlinear Regression
 Select "population" for the Response
 Type the following into the "Edit directly" box under Expectation Function: beta1/(1+exp(beta2+beta3*(year1790)/10))
 Click Parameters and type in the values specified above (350, 4.5, and –0.3)
As you can see, the starting values resulted in convergence with values not too far from our guess.
Equation
population = 389.166/(1+exp(3.99035  0.22662 * (year  1790) / 10))
Parameter Estimates
Paraemeter  Estimate  SE Estimate 

beta1  389.166  30.8120 
beta2  3.990  0.0703 
beta3  0.227  0.0109 
population = beta1/(1 + exp(beta2 + beta3 * (year  1790) / 10))
Lack of Fit
There are no replicates.
Minitab cannot do the lack of fit test based on pure errror.
Summary
Iterations  8 
Final SSE 
356.400 

DFE  18 
MSE  19.8000 
S  4.44972 
Here is a plot of the residuals versus the year.
As you can see, the logistic functional form that we chose did catch the gross characteristics of this data, but some of the nuances appear to not be as well characterized. Since there are indications of some cyclical behavior, a model incorporating correlated errors or, perhaps, trigonometric functions could be investigated.
Software Help 15
Software Help 15
The next two pages cover the Minitab and R commands for the procedures in this lesson.
Below is a zip file that contains all the data sets used in this lesson:
 us_census.txt
 leukemia_remission.txt
 poisson_simulated.txt
 toxicity.txt
Minitab Help 15: Logistic, Poisson & Nonlinear Regression
Minitab Help 15: Logistic, Poisson & Nonlinear RegressionMinitab^{®}
Leukemia remission (logistic regression)
 Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put REMISS in the "Response" box, and put CELL, SMEAR, INFIL, LI, BLAST, and TEMP in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
 Repeat but with just LI as a single predictor.
 Fit a logistic regression model of REMISS vs LI.
 Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of REMISS vs LI with a fitted line based on the logistic regression model.
 Before clicking "OK" in the Regression Dialog, click "Options" and type "10" into the box labeled "Number of groups for HosmerLemeshow test." This will result in a new table in the output titled "GoodnessofFit Tests" with results for Deviance (not valid for this example), Pearson (also not valid for this example), and HosmerLemeshow goodnessoffit tests.
 Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Order" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."
Disease outbreak (logistic regression)
 Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put Disease in the "Response" box, and put Age, Middle, Lower, and Sector in the "Continuous predictors" box. Before clicking "OK," click "Model," shiftselect the four predictors in the topleft box, click "Add" next to "Interactions through order 2," but remove the "Middle*Lower" interaction from the "Terms in the model" box since it is meaningless.
 Repeat but with the five interactions removed.
 Repeat but with the five interactions included and before clicking "OK," click "Options" and select "Sequential (Type I)" for "Deviances for tests."
Toxicity and insects (logistic regression using event/trial data format)
 Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, select "Response in event/trial format," put Deaths in the "Number of events" box, put SampSize in the "Number of trials" box, and put "Dose" in the "Continuous predictors" box. Change "Event name" to "Death" if you like (optional). Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
 Select Calc > Calculator to calculate observed probabilities as Deaths/SampSize.
 Before clicking "OK" in the Regression Dialog, click "Storage" and select "Fits (event probabilities)."
 Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of observed probabilities vs Dose with a fitted line based on the logistic regression model.
Poisson example (Poisson regression)
 Select Graph > Scatterplot to create a scatterplot of the data.
 Select Stat > Regression > Poisson Regression > Fit Poisson Model, put y in the "Response" box and put x in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
 Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Fits" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."
Hospital recovery (exponential regression)
 Select Calc > Calculator to create a log(prog) variable.
 Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
 Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put prog in the "Response" box and type "Theta1 * exp(Theta2 * days)" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and 0.038 as the starting value for Theta2.
 A scatterplot of prog vs days with a fitted line based on the nonlinear regression model is produced by default.
U.S. census population (population growth nonlinear regression)
 Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
 Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put population in the "Response" box and type "beta1 / (1 + exp(beta2 + beta3 * (year  1790) / 10))" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and 0.038 as the starting value for Theta2.
 A scatterplot of population vs year with a fitted line based on the nonlinear regression model is produced by default.
 Before clicking "OK" in the Regression Dialog, click "Graphs" and put year in the "Residuals versus the variables" box.
R Help 15: Logistic, Poisson & Nonlinear Regression
R Help 15: Logistic, Poisson & Nonlinear RegressionR
Leukemia remission (logistic regression)
 Load the leukemia data.
 Fit a logistic regression model of REMISS vs CELL + SMEAR + INFIL + LI + BLAST + TEMP.
 Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the leastsquares estimation surface.
 Fit a logistic regression model of REMISS vs LI.
 Create a sctterplot of REMISS vs LI and add a fitted line based on the logistic regression model.
 Calculate the odds ratio for LI and a 95% confidence interval.
 Conduct a likelihood ratio (or deviance) test for LI.
 Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals.
 Use the
hoslem.test
function in theResourceSelection
package to conduct the HosmerLemeshow goodnessoffit test.  Calculate a version of \(R^2\) for logistic regression.
 Create residual plots using Pearson and deviance residuals.
 Calculate hat values (leverages), studentized residuals, and Cook's distances.
leukemia < read.table("~/pathtodata/leukemia_remission.txt", header=T)
attach(leukemia)
model.1 < glm(REMISS ~ CELL + SMEAR + INFIL + LI + BLAST + TEMP, family="binomial")
summary(model.1)
# Estimate Std. Error z value Pr(>z)
# (Intercept) 64.25808 74.96480 0.857 0.391
# CELL 30.83006 52.13520 0.591 0.554
# SMEAR 24.68632 61.52601 0.401 0.688
# INFIL 24.97447 65.28088 0.383 0.702
# LI 4.36045 2.65798 1.641 0.101
# BLAST 0.01153 2.26634 0.005 0.996
# TEMP 100.17340 77.75289 1.288 0.198
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 34.372 on 26 degrees of freedom
# Residual deviance: 21.594 on 20 degrees of freedom
# AIC: 35.594
confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the leastsquares estimation surface
model.2 < glm(REMISS ~ LI, family="binomial")
summary(model.2)
# Estimate Std. Error z value Pr(>z)
# (Intercept) 3.777 1.379 2.740 0.00615 **
# LI 2.897 1.187 2.441 0.01464 *
# 
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 34.372 on 26 degrees of freedom
# Residual deviance: 26.073 on 25 degrees of freedom
# AIC: 30.073
plot(x=LI, y=REMISS,
panel.last = lines(sort(LI), fitted(model.2)[order(LI)]))
exp(coef(model.2)[2]) # odds ratio = 18.12449
exp(confint.default(model.2)[2,]) # 95% CI = (1.770284, 185.561725)
anova(model.2, test="Chisq")
# Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL 26 34.372
# LI 1 8.2988 25 26.073 0.003967 **
sum(residuals(model.2, type="deviance")^2) # 26.07296
model.2$deviance # 26.07296
sum(residuals(model.2, type="pearson")^2) # 23.93298
library(ResourceSelection)
hoslem.test(model.2$y, fitted(model.2), g=9)
# Hosmer and Lemeshow goodness of fit (GOF) test
# data: REMISS, fitted(model.2)
# Xsquared = 7.3293, df = 7, pvalue = 0.3954
1model.2$deviance/model.2$null.deviance # "Rsquared" = 0.2414424
plot(1:27, residuals(model.2, type="pearson"), type="b")
plot(1:27, residuals(model.2, type="deviance"), type="b")
summary(influence.measures(model.2))
# dfb.1_ dfb.LI dffit cov.r cook.d hat
# 8 0.63 0.83 0.93_* 0.88 0.58 0.15
hatvalues(model.2)[8] # 0.1498395
residuals(model.2)[8] # 1.944852
rstudent(model.2)[8] # 2.185013
cooks.distance(model.2)[8] # 0.5833219
detach(leukemia)
Disease outbreak (logistic regression)
 Load the disease outbreak data.
 Create interaction variables.
 Fit "full" logistic regression model of Disease vs four predictors and five interactions.
 Fit "reduced" logistic regression model of Disease vs four predictors.
 Conduct a likelihood ratio (or deviance) test for the five interactions.
 Display the analysis of deviance table with sequential deviances.
disease < read.table("~/pathtodata/DiseaseOutbreak.txt", header=T)
attach(disease)
Age.Middle < Age*Middle
Age.Lower < Age*Lower
Age.Sector < Age*Sector
Middle.Sector < Middle*Sector
Lower.Sector < Lower*Sector
model.1 < glm(Disease ~ Age + Middle + Lower + Sector + Age.Middle + Age.Lower +
Age.Sector + Middle.Sector + Lower.Sector, family="binomial")
model.2 < glm(Disease ~ Age + Middle + Lower + Sector, family="binomial")
anova(model.2, model.1, test="Chisq")
# Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1 93 101.054
# 2 88 93.996 5 7.0583 0.2163
anova(model.1, test="Chisq")
# Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL 97 122.318
# Age 1 7.4050 96 114.913 0.006504 **
# Middle 1 1.8040 95 113.109 0.179230
# Lower 1 1.6064 94 111.502 0.205003
# Sector 1 10.4481 93 101.054 0.001228 **
# Age.Middle 1 4.5697 92 96.484 0.032542 *
# Age.Lower 1 1.0152 91 95.469 0.313666
# Age.Sector 1 1.1202 90 94.349 0.289878
# Middle.Sector 1 0.0001 89 94.349 0.993427
# Lower.Sector 1 0.3531 88 93.996 0.552339
detach(disease)
Toxicity and insects (logistic regression using event/trial data format)
 Load the toxicity data.
 Create a Survivals variable and a matrix with Deaths in one column and Survivals in the other column.
 Fit a logistic regression model of Deaths vs Dose.
 Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the leastsquares estimation surface.
 Calculate the odds ratio for Dose and a 95% confidence interval.
 Display the observed and fitted probabilities.
 Create a sctterplot of observed probabilties vs Dose and add a fitted line based on the logistic regression model.
toxicity < read.table("~/pathtodata/toxicity.txt", header=T)
attach(toxicity)
Survivals < SampSize  Deaths
y < cbind(Deaths, Survivals)
model.1 < glm(y ~ Dose, family="binomial")
summary(model.1)
# Estimate Std. Error z value Pr(>z)
# (Intercept) 2.64367 0.15610 16.93 <2e16 ***
# Dose 0.67399 0.03911 17.23 <2e16 ***
# 
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 383.0695 on 5 degrees of freedom
# Residual deviance: 1.4491 on 4 degrees of freedom
# AIC: 39.358
confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the leastsquares estimation surface
exp(coef(model.1)[2]) # odds ratio = 1.962056
exp(confint.default(model.1)[2,]) # 95% CI = (1.817279, 2.118366)
cbind(Dose, SampSize, Deaths, Deaths/SampSize, fitted(model.1))
# Dose SampSize Deaths
# 1 1 250 28 0.112 0.1224230
# 2 2 250 53 0.212 0.2148914
# 3 3 250 93 0.372 0.3493957
# 4 4 250 126 0.504 0.5130710
# 5 5 250 172 0.688 0.6739903
# 6 6 250 197 0.788 0.8022286
plot(x=Dose, y=Deaths/SampSize,
panel.last = lines(sort(Dose), fitted(model.1)[order(Dose)]))
detach(toxicity)
Poisson example (Poisson regression)
 Load the poisson data.
 Create a scatterplot of the data.
 Fit a Poisson regression model of y vs x.
 Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the leastsquares estimation surface.
 Create a sctterplot of y vs x and add a fitted line based on the Poisson regression model.
 Conduct a likelihood ratio (or deviance) test for x.
 Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals and calculate pvalues based on chisquared goodnessoffit tests.
 Calculate pseudo \(R^2\) for Poisson regression.
 Create residual plots using Pearson and deviance residuals.
 Calculate hat values (leverages) and studentized residuals.
poisson < read.table("~/pathtodata/poisson_simulated.txt", header=T)
attach(poisson)
plot(x=x, y=y)
model.1 < glm(y ~ x, family="poisson")
summary(model.1)
# Estimate Std. Error z value Pr(>z)
# (Intercept) 0.30787 0.28943 1.064 0.287
# x 0.07636 0.01730 4.413 1.02e05 ***
# 
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for poisson family taken to be 1)
#
# Null deviance: 48.310 on 29 degrees of freedom
# Residual deviance: 27.842 on 28 degrees of freedom
# AIC: 124.5
confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the leastsquares estimation surface
plot(x=x, y=y,
panel.last = lines(sort(x), fitted(model.1)[order(x)]))
anova(model.1, test="Chisq")
# Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL 29 48.310
# x 1 20.468 28 27.842 6.065e06 ***
sum(residuals(model.1, type="deviance")^2) # 27.84209
model.1$deviance # 27.84209
pchisq(model.1$deviance, 28, lower.tail=F) # pvalue = 0.4728389
sum(residuals(model.1, type="pearson")^2) # 26.09324
pchisq(sum(residuals(model.1, type="pearson")^2), 28, lower.tail=F) # pvalue = 0.5679192
1model.1$deviance/model.1$null.deviance # Pseudo Rsquared = 0.423676
plot(fitted(model.1), residuals(model.1, type="pearson"))
plot(fitted(model.1), residuals(model.1, type="deviance"))
summary(influence.measures(model.1))
# dfb.1_ dfb.x dffit cov.r cook.d hat
# 10 0.22 0.30 0.37 1.25_* 0.08 0.18
# 21 0.37 0.48 0.57 1.30_* 0.15 0.23_*
residuals(model.1)[8] # 1.974329
rstudent(model.1)[8] # 2.028255
detach(poisson)
Hospital recovery (exponential regression)
 Load the recovery data.
 Create log(prog) variable.
 Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
 Fit nonlinear regression model to data using these starting values.
 Create a scatterplot of prog vs days and add a fitted line based on the nonlinear regression model.
recovery < read.table("~/pathtodata/recovery.txt", header=T)
attach(recovery)
logprog < log(prog)
summary(lm(logprog ~ days))
# Estimate Std. Error t value Pr(>t)
# (Intercept) 4.037159 0.084103 48.00 5.08e16 ***
# days 0.037974 0.002284 16.62 3.86e10 ***
exp(4.037159) # 56.66513
model.1 < nls(prog ~ theta1 * exp(theta2 * days),
start=list(theta1=56.7, theta2=0.038))
summary(model.1)
# Estimate Std. Error t value Pr(>t)
# theta1 58.606532 1.472159 39.81 5.70e15 ***
# theta2 0.039586 0.001711 23.13 6.01e12 ***
# 
# Residual standard error: 1.951 on 13 degrees of freedom
plot(x=days, y=prog,
panel.last = lines(sort(days), fitted(model.1)[order(days)]))
detach(recovery)
U.S. census population (population growth nonlinear regression)
 Load the census data.
 Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
 Fit nonlinear regression model to data using these starting values.
 Create a scatterplot of population vs year and add a fitted line based on the nonlinear regression model.
 Create a residual plot.
census < read.table("~/pathtodata/us_census.txt", header=T)
attach(census)
plot(x=year, y=population)
log(350/3.9291) # 4.478259
log(350/5.3081)  log(350/3.9291) # 0.3048229
model.1 < nls(population ~ beta1 / (1 + exp(beta2 + beta3 * (year  1790) / 10)),
start=list(beta1=350, beta2=4.5, beta3=0.3))
summary(model.1)
# Estimate Std. Error t value Pr(>t)
# beta1 389.16551 30.81196 12.63 2.2e10 ***
# beta2 3.99035 0.07032 56.74 < 2e16 ***
# beta3 0.22662 0.01086 20.87 4.6e14 ***
# 
# Residual standard error: 4.45 on 18 degrees of freedom
plot(x=year, y=population,
panel.last = lines(sort(year), fitted(model.1)[order(year)]))
plot(x=year, y=residuals(model.1),
panel.last = abline(h=0, lty=2))
detach(census)