# Software Help 15

Software Help 15

The next two pages cover the Minitab and R commands for the procedures in this lesson.

Below is a zip file that contains all the data sets used in this lesson:

- us_census.txt
- leukemia_remission.txt
- poisson_simulated.txt
- toxicity.txt

# Minitab Help 15: Logistic, Poisson & Nonlinear Regression

Minitab Help 15: Logistic, Poisson & Nonlinear Regression##
Minitab^{®}

### Leukemia remission (logistic regression)

- Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put REMISS in the "Response" box, and put CELL, SMEAR, INFIL, LI, BLAST, and TEMP in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
- Repeat but with just LI as a single predictor.
- Fit a logistic regression model of REMISS vs LI.
- Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of REMISS vs LI with a fitted line based on the logistic regression model.
- Before clicking "OK" in the Regression Dialog, click "Options" and type "10" into the box labeled "Number of groups for Hosmer-Lemeshow test." This will result in a new table in the output titled "Goodness-of-Fit Tests" with results for Deviance (not valid for this example), Pearson (also not valid for this example), and Hosmer-Lemeshow goodness-of-fit tests.
- Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Order" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."

### Disease outbreak (logistic regression)

- Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put Disease in the "Response" box, and put Age, Middle, Lower, and Sector in the "Continuous predictors" box. Before clicking "OK," click "Model," shift-select the four predictors in the top-left box, click "Add" next to "Interactions through order 2," but remove the "Middle*Lower" interaction from the "Terms in the model" box since it is meaningless.
- Repeat but with the five interactions removed.
- Repeat but with the five interactions included and before clicking "OK," click "Options" and select "Sequential (Type I)" for "Deviances for tests."

### Toxicity and insects (logistic regression using event/trial data format)

- Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, select "Response in event/trial format," put Deaths in the "Number of events" box, put SampSize in the "Number of trials" box, and put "Dose" in the "Continuous predictors" box. Change "Event name" to "Death" if you like (optional). Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
- Select Calc > Calculator to calculate observed probabilities as Deaths/SampSize.
- Before clicking "OK" in the Regression Dialog, click "Storage" and select "Fits (event probabilities)."
- Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of observed probabilities vs Dose with a fitted line based on the logistic regression model.

### Poisson example (Poisson regression)

- Select Graph > Scatterplot to create a scatterplot of the data.
- Select Stat > Regression > Poisson Regression > Fit Poisson Model, put y in the "Response" box and put x in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
- Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Fits" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."

### Hospital recovery (exponential regression)

- Select Calc > Calculator to create a log(prog) variable.
- Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
- Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put prog in the "Response" box and type "Theta1 * exp(Theta2 * days)" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and -0.038 as the starting value for Theta2.
- A scatterplot of prog vs days with a fitted line based on the nonlinear regression model is produced by default.

### U.S. census population (population growth nonlinear regression)

- Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
- Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put population in the "Response" box and type "beta1 / (1 + exp(beta2 + beta3 * (year - 1790) / 10))" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and -0.038 as the starting value for Theta2.
- A scatterplot of population vs year with a fitted line based on the nonlinear regression model is produced by default.
- Before clicking "OK" in the Regression Dialog, click "Graphs" and put year in the "Residuals versus the variables" box.

# R Help 15: Logistic, Poisson & Nonlinear Regression

R Help 15: Logistic, Poisson & Nonlinear Regression## R

### Leukemia remission (logistic regression)

- Load the leukemia data.
- Fit a logistic regression model of REMISS vs CELL + SMEAR + INFIL + LI + BLAST + TEMP.
- Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
- Fit a logistic regression model of REMISS vs LI.
- Create a sctterplot of REMISS vs LI and add a fitted line based on the logistic regression model.
- Calculate the odds ratio for LI and a 95% confidence interval.
- Conduct a likelihood ratio (or deviance) test for LI.
- Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals.
- Use the
`hoslem.test`

function in the`ResourceSelection`

package to conduct the Hosmer-Lemeshow goodness-of-fit test. - Calculate a version of \(R^2\) for logistic regression.
- Create residual plots using Pearson and deviance residuals.
- Calculate hat values (leverages), studentized residuals, and Cook's distances.

` ````
leukemia <- read.table("~/path-to-data/leukemia_remission.txt", header=T)
attach(leukemia)
model.1 <- glm(REMISS ~ CELL + SMEAR + INFIL + LI + BLAST + TEMP, family="binomial")
summary(model.1)
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 64.25808 74.96480 0.857 0.391
# CELL 30.83006 52.13520 0.591 0.554
# SMEAR 24.68632 61.52601 0.401 0.688
# INFIL -24.97447 65.28088 -0.383 0.702
# LI 4.36045 2.65798 1.641 0.101
# BLAST -0.01153 2.26634 -0.005 0.996
# TEMP -100.17340 77.75289 -1.288 0.198
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 34.372 on 26 degrees of freedom
# Residual deviance: 21.594 on 20 degrees of freedom
# AIC: 35.594
confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface
model.2 <- glm(REMISS ~ LI, family="binomial")
summary(model.2)
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -3.777 1.379 -2.740 0.00615 **
# LI 2.897 1.187 2.441 0.01464 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 34.372 on 26 degrees of freedom
# Residual deviance: 26.073 on 25 degrees of freedom
# AIC: 30.073
plot(x=LI, y=REMISS,
panel.last = lines(sort(LI), fitted(model.2)[order(LI)]))
exp(coef(model.2)[2]) # odds ratio = 18.12449
exp(confint.default(model.2)[2,]) # 95% CI = (1.770284, 185.561725)
anova(model.2, test="Chisq")
# Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL 26 34.372
# LI 1 8.2988 25 26.073 0.003967 **
sum(residuals(model.2, type="deviance")^2) # 26.07296
model.2$deviance # 26.07296
sum(residuals(model.2, type="pearson")^2) # 23.93298
library(ResourceSelection)
hoslem.test(model.2$y, fitted(model.2), g=9)
# Hosmer and Lemeshow goodness of fit (GOF) test
# data: REMISS, fitted(model.2)
# X-squared = 7.3293, df = 7, p-value = 0.3954
1-model.2$deviance/model.2$null.deviance # "R-squared" = 0.2414424
plot(1:27, residuals(model.2, type="pearson"), type="b")
plot(1:27, residuals(model.2, type="deviance"), type="b")
summary(influence.measures(model.2))
# dfb.1_ dfb.LI dffit cov.r cook.d hat
# 8 0.63 -0.83 -0.93_* 0.88 0.58 0.15
hatvalues(model.2)[8] # 0.1498395
residuals(model.2)[8] # -1.944852
rstudent(model.2)[8] # -2.185013
cooks.distance(model.2)[8] # 0.5833219
detach(leukemia)
```

### Disease outbreak (logistic regression)

- Load the disease outbreak data.
- Create interaction variables.
- Fit "full" logistic regression model of Disease vs four predictors and five interactions.
- Fit "reduced" logistic regression model of Disease vs four predictors.
- Conduct a likelihood ratio (or deviance) test for the five interactions.
- Display the analysis of deviance table with sequential deviances.

```
disease <- read.table("~/path-to-data/DiseaseOutbreak.txt", header=T)
attach(disease)
Age.Middle <- Age*Middle
Age.Lower <- Age*Lower
Age.Sector <- Age*Sector
Middle.Sector <- Middle*Sector
Lower.Sector <- Lower*Sector
model.1 <- glm(Disease ~ Age + Middle + Lower + Sector + Age.Middle + Age.Lower +
Age.Sector + Middle.Sector + Lower.Sector, family="binomial")
model.2 <- glm(Disease ~ Age + Middle + Lower + Sector, family="binomial")
anova(model.2, model.1, test="Chisq")
# Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1 93 101.054
# 2 88 93.996 5 7.0583 0.2163
anova(model.1, test="Chisq")
# Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL 97 122.318
# Age 1 7.4050 96 114.913 0.006504 **
# Middle 1 1.8040 95 113.109 0.179230
# Lower 1 1.6064 94 111.502 0.205003
# Sector 1 10.4481 93 101.054 0.001228 **
# Age.Middle 1 4.5697 92 96.484 0.032542 *
# Age.Lower 1 1.0152 91 95.469 0.313666
# Age.Sector 1 1.1202 90 94.349 0.289878
# Middle.Sector 1 0.0001 89 94.349 0.993427
# Lower.Sector 1 0.3531 88 93.996 0.552339
detach(disease)
```

### Toxicity and insects (logistic regression using event/trial data format)

- Load the toxicity data.
- Create a Survivals variable and a matrix with Deaths in one column and Survivals in the other column.
- Fit a logistic regression model of Deaths vs Dose.
- Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
- Calculate the odds ratio for Dose and a 95% confidence interval.
- Display the observed and fitted probabilities.
- Create a sctterplot of observed probabilties vs Dose and add a fitted line based on the logistic regression model.

```
toxicity <- read.table("~/path-to-data/toxicity.txt", header=T)
attach(toxicity)
Survivals <- SampSize - Deaths
y <- cbind(Deaths, Survivals)
model.1 <- glm(y ~ Dose, family="binomial")
summary(model.1)
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -2.64367 0.15610 -16.93 <2e-16 ***
# Dose 0.67399 0.03911 17.23 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 383.0695 on 5 degrees of freedom
# Residual deviance: 1.4491 on 4 degrees of freedom
# AIC: 39.358
confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface
exp(coef(model.1)[2]) # odds ratio = 1.962056
exp(confint.default(model.1)[2,]) # 95% CI = (1.817279, 2.118366)
cbind(Dose, SampSize, Deaths, Deaths/SampSize, fitted(model.1))
# Dose SampSize Deaths
# 1 1 250 28 0.112 0.1224230
# 2 2 250 53 0.212 0.2148914
# 3 3 250 93 0.372 0.3493957
# 4 4 250 126 0.504 0.5130710
# 5 5 250 172 0.688 0.6739903
# 6 6 250 197 0.788 0.8022286
plot(x=Dose, y=Deaths/SampSize,
panel.last = lines(sort(Dose), fitted(model.1)[order(Dose)]))
detach(toxicity)
```

### Poisson example (Poisson regression)

- Load the poisson data.
- Create a scatterplot of the data.
- Fit a Poisson regression model of y vs x.
- Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
- Create a sctterplot of y vs x and add a fitted line based on the Poisson regression model.
- Conduct a likelihood ratio (or deviance) test for x.
- Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals and calculate p-values based on chi-squared goodness-of-fit tests.
- Calculate pseudo \(R^2\) for Poisson regression.
- Create residual plots using Pearson and deviance residuals.
- Calculate hat values (leverages) and studentized residuals.

```
poisson <- read.table("~/path-to-data/poisson_simulated.txt", header=T)
attach(poisson)
plot(x=x, y=y)
model.1 <- glm(y ~ x, family="poisson")
summary(model.1)
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 0.30787 0.28943 1.064 0.287
# x 0.07636 0.01730 4.413 1.02e-05 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for poisson family taken to be 1)
#
# Null deviance: 48.310 on 29 degrees of freedom
# Residual deviance: 27.842 on 28 degrees of freedom
# AIC: 124.5
confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface
plot(x=x, y=y,
panel.last = lines(sort(x), fitted(model.1)[order(x)]))
anova(model.1, test="Chisq")
# Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL 29 48.310
# x 1 20.468 28 27.842 6.065e-06 ***
sum(residuals(model.1, type="deviance")^2) # 27.84209
model.1$deviance # 27.84209
pchisq(model.1$deviance, 28, lower.tail=F) # p-value = 0.4728389
sum(residuals(model.1, type="pearson")^2) # 26.09324
pchisq(sum(residuals(model.1, type="pearson")^2), 28, lower.tail=F) # p-value = 0.5679192
1-model.1$deviance/model.1$null.deviance # Pseudo R-squared = 0.423676
plot(fitted(model.1), residuals(model.1, type="pearson"))
plot(fitted(model.1), residuals(model.1, type="deviance"))
summary(influence.measures(model.1))
# dfb.1_ dfb.x dffit cov.r cook.d hat
# 10 -0.22 0.30 0.37 1.25_* 0.08 0.18
# 21 0.37 -0.48 -0.57 1.30_* 0.15 0.23_*
residuals(model.1)[8] # 1.974329
rstudent(model.1)[8] # 2.028255
detach(poisson)
```

### Hospital recovery (exponential regression)

- Load the recovery data.
- Create log(prog) variable.
- Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
- Fit nonlinear regression model to data using these starting values.
- Create a scatterplot of prog vs days and add a fitted line based on the nonlinear regression model.

```
recovery <- read.table("~/path-to-data/recovery.txt", header=T)
attach(recovery)
logprog <- log(prog)
summary(lm(logprog ~ days))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 4.037159 0.084103 48.00 5.08e-16 ***
# days -0.037974 0.002284 -16.62 3.86e-10 ***
exp(4.037159) # 56.66513
model.1 <- nls(prog ~ theta1 * exp(theta2 * days),
start=list(theta1=56.7, theta2=-0.038))
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# theta1 58.606532 1.472159 39.81 5.70e-15 ***
# theta2 -0.039586 0.001711 -23.13 6.01e-12 ***
# ---
# Residual standard error: 1.951 on 13 degrees of freedom
plot(x=days, y=prog,
panel.last = lines(sort(days), fitted(model.1)[order(days)]))
detach(recovery)
```

### U.S. census population (population growth nonlinear regression)

- Load the census data.
- Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
- Fit nonlinear regression model to data using these starting values.
- Create a scatterplot of population vs year and add a fitted line based on the nonlinear regression model.
- Create a residual plot.

```
census <- read.table("~/path-to-data/us_census.txt", header=T)
attach(census)
plot(x=year, y=population)
log(350/3.929-1) # 4.478259
log(350/5.308-1) - log(350/3.929-1) # -0.3048229
model.1 <- nls(population ~ beta1 / (1 + exp(beta2 + beta3 * (year - 1790) / 10)),
start=list(beta1=350, beta2=4.5, beta3=-0.3))
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# beta1 389.16551 30.81196 12.63 2.2e-10 ***
# beta2 3.99035 0.07032 56.74 < 2e-16 ***
# beta3 -0.22662 0.01086 -20.87 4.6e-14 ***
# ---
# Residual standard error: 4.45 on 18 degrees of freedom
plot(x=year, y=population,
panel.last = lines(sort(year), fitted(model.1)[order(year)]))
plot(x=year, y=residuals(model.1),
panel.last = abline(h=0, lty=2))
detach(census)
```