Software Help 15

Software Help 15

  

The next two pages cover the Minitab and R commands for the procedures in this lesson.

Below is a zip file that contains all the data sets used in this lesson:

STAT501_Lesson15.zip

  • us_census.txt
  • leukemia_remission.txt
  • poisson_simulated.txt
  • toxicity.txt

Minitab Help 15: Logistic, Poisson & Nonlinear Regression

Minitab Help 15: Logistic, Poisson & Nonlinear Regression

Minitab®

Leukemia remission (logistic regression)

  • Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put REMISS in the "Response" box, and put CELL, SMEAR, INFIL, LI, BLAST, and TEMP in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
  • Repeat but with just LI as a single predictor.
  • Fit a logistic regression model of REMISS vs LI.
  • Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of REMISS vs LI with a fitted line based on the logistic regression model.
  • Before clicking "OK" in the Regression Dialog, click "Options" and type "10" into the box labeled "Number of groups for Hosmer-Lemeshow test." This will result in a new table in the output titled "Goodness-of-Fit Tests" with results for Deviance (not valid for this example), Pearson (also not valid for this example), and Hosmer-Lemeshow goodness-of-fit tests.
  • Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Order" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."

Disease outbreak (logistic regression)

  • Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, make sure "Response in binary response/frequency format" is selected, put Disease in the "Response" box, and put Age, Middle, Lower, and Sector in the "Continuous predictors" box. Before clicking "OK," click "Model," shift-select the four predictors in the top-left box, click "Add" next to "Interactions through order 2," but remove the "Middle*Lower" interaction from the "Terms in the model" box since it is meaningless.
  • Repeat but with the five interactions removed.
  • Repeat but with the five interactions included and before clicking "OK," click "Options" and select "Sequential (Type I)" for "Deviances for tests."

Toxicity and insects (logistic regression using event/trial data format)

  • Select Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model, select "Response in event/trial format," put Deaths in the "Number of events" box, put SampSize in the "Number of trials" box, and put "Dose" in the "Continuous predictors" box. Change "Event name" to "Death" if you like (optional). Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
  • Select Calc > Calculator to calculate observed probabilities as Deaths/SampSize.
  • Before clicking "OK" in the Regression Dialog, click "Storage" and select "Fits (event probabilities)."
  • Select Stat > Regression > Binary Fitted Line Plot to create a sctterplot of observed probabilities vs Dose with a fitted line based on the logistic regression model.

Poisson example (Poisson regression)

  • Select Graph > Scatterplot to create a scatterplot of the data.
  • Select Stat > Regression > Poisson Regression > Fit Poisson Model, put y in the "Response" box and put x in the "Continuous predictors" box. Before clicking "OK," click "Results" and select "Expanded tables" for "Display of results."
  • Before clicking "OK" in the Regression Dialog, click "Graphs" and select "Residuals versus Fits" to create residual plots using deviance residuals. To change to Pearson residuals, click "Options" in the Regression Dialog and select "Pearson" for "Residuals for diagnostics."

Hospital recovery (exponential regression)

  • Select Calc > Calculator to create a log(prog) variable.
  • Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
  • Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put prog in the "Response" box and type "Theta1 * exp(Theta2 * days)" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and -0.038 as the starting value for Theta2.
  • A scatterplot of prog vs days with a fitted line based on the nonlinear regression model is produced by default.

U.S. census population (population growth nonlinear regression)

  • Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
  • Select Stat > Regression > Nonlinear Regression to fit a nonlinear regression model to data using these starting values. Put population in the "Response" box and type "beta1 / (1 + exp(beta2 + beta3 * (year - 1790) / 10))" into the box labeled "Edit directly." Before clicking OK, click "Parameters" and enter 56.7 as the starting value for Theta1 and -0.038 as the starting value for Theta2.
  • A scatterplot of population vs year with a fitted line based on the nonlinear regression model is produced by default.
  • Before clicking "OK" in the Regression Dialog, click "Graphs" and put year in the "Residuals versus the variables" box.

R Help 15: Logistic, Poisson & Nonlinear Regression

R Help 15: Logistic, Poisson & Nonlinear Regression

R

Leukemia remission (logistic regression)

  • Load the leukemia data.
  • Fit a logistic regression model of REMISS vs CELL + SMEAR + INFIL + LI + BLAST + TEMP.
  • Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
  • Fit a logistic regression model of REMISS vs LI.
  • Create a sctterplot of REMISS vs LI and add a fitted line based on the logistic regression model.
  • Calculate the odds ratio for LI and a 95% confidence interval.
  • Conduct a likelihood ratio (or deviance) test for LI.
  • Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals.
  • Use the hoslem.test function in the ResourceSelection package to conduct the Hosmer-Lemeshow goodness-of-fit test.
  • Calculate a version of \(R^2\) for logistic regression.
  • Create residual plots using Pearson and deviance residuals.
  • Calculate hat values (leverages), studentized residuals, and Cook's distances.
          
leukemia <- read.table("~/path-to-data/leukemia_remission.txt", header=T)
attach(leukemia)

model.1 <- glm(REMISS ~ CELL + SMEAR + INFIL + LI + BLAST + TEMP, family="binomial")
summary(model.1)
#               Estimate Std. Error z value Pr(>|z|)
# (Intercept)   64.25808   74.96480   0.857    0.391
# CELL          30.83006   52.13520   0.591    0.554
# SMEAR         24.68632   61.52601   0.401    0.688
# INFIL        -24.97447   65.28088  -0.383    0.702
# LI             4.36045    2.65798   1.641    0.101
# BLAST         -0.01153    2.26634  -0.005    0.996
# TEMP        -100.17340   77.75289  -1.288    0.198
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
# Null deviance: 34.372  on 26  degrees of freedom
# Residual deviance: 21.594  on 20  degrees of freedom
# AIC: 35.594

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

model.2 <- glm(REMISS ~ LI, family="binomial")
summary(model.2)
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)   -3.777      1.379  -2.740  0.00615 **
# LI             2.897      1.187   2.441  0.01464 * 
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
# Null deviance: 34.372  on 26  degrees of freedom
# Residual deviance: 26.073  on 25  degrees of freedom
# AIC: 30.073

plot(x=LI, y=REMISS,
     panel.last = lines(sort(LI), fitted(model.2)[order(LI)]))

exp(coef(model.2)[2]) # odds ratio = 18.12449
exp(confint.default(model.2)[2,]) # 95% CI = (1.770284, 185.561725)

anova(model.2, test="Chisq")
#      Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
# NULL                    26     34.372            
# LI    1   8.2988        25     26.073 0.003967 **

sum(residuals(model.2, type="deviance")^2) # 26.07296
model.2$deviance # 26.07296
sum(residuals(model.2, type="pearson")^2) # 23.93298

library(ResourceSelection)
hoslem.test(model.2$y, fitted(model.2), g=9)
# Hosmer and Lemeshow goodness of fit (GOF) test
# data:  REMISS, fitted(model.2)
# X-squared = 7.3293, df = 7, p-value = 0.3954

1-model.2$deviance/model.2$null.deviance # "R-squared" = 0.2414424

plot(1:27, residuals(model.2, type="pearson"), type="b")
plot(1:27, residuals(model.2, type="deviance"), type="b")

summary(influence.measures(model.2))
#  dfb.1_ dfb.LI dffit   cov.r cook.d hat  
# 8  0.63  -0.83  -0.93_*  0.88  0.58   0.15

hatvalues(model.2)[8] # 0.1498395
residuals(model.2)[8] # -1.944852
rstudent(model.2)[8] # -2.185013
cooks.distance(model.2)[8] # 0.5833219

detach(leukemia)

Disease outbreak (logistic regression)

  • Load the disease outbreak data.
  • Create interaction variables.
  • Fit "full" logistic regression model of Disease vs four predictors and five interactions.
  • Fit "reduced" logistic regression model of Disease vs four predictors.
  • Conduct a likelihood ratio (or deviance) test for the five interactions.
  • Display the analysis of deviance table with sequential deviances.

disease <- read.table("~/path-to-data/DiseaseOutbreak.txt", header=T)
attach(disease)

Age.Middle <- Age*Middle
Age.Lower <- Age*Lower
Age.Sector <- Age*Sector
Middle.Sector <- Middle*Sector
Lower.Sector <- Lower*Sector

model.1 <- glm(Disease ~ Age + Middle + Lower + Sector + Age.Middle + Age.Lower +
                 Age.Sector + Middle.Sector + Lower.Sector, family="binomial")
model.2 <- glm(Disease ~ Age + Middle + Lower + Sector, family="binomial")
anova(model.2, model.1, test="Chisq")
#   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1        93    101.054                     
# 2        88     93.996  5   7.0583   0.2163

anova(model.1, test="Chisq")
#               Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
# NULL                             97    122.318            
# Age            1   7.4050        96    114.913 0.006504 **
# Middle         1   1.8040        95    113.109 0.179230   
# Lower          1   1.6064        94    111.502 0.205003   
# Sector         1  10.4481        93    101.054 0.001228 **
# Age.Middle     1   4.5697        92     96.484 0.032542 * 
# Age.Lower      1   1.0152        91     95.469 0.313666   
# Age.Sector     1   1.1202        90     94.349 0.289878   
# Middle.Sector  1   0.0001        89     94.349 0.993427   
# Lower.Sector   1   0.3531        88     93.996 0.552339   

detach(disease)

Toxicity and insects (logistic regression using event/trial data format)

  • Load the toxicity data.
  • Create a Survivals variable and a matrix with Deaths in one column and Survivals in the other column.
  • Fit a logistic regression model of Deaths vs Dose.
  • Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
  • Calculate the odds ratio for Dose and a 95% confidence interval.
  • Display the observed and fitted probabilities.
  • Create a sctterplot of observed probabilties vs Dose and add a fitted line based on the logistic regression model.

toxicity <- read.table("~/path-to-data/toxicity.txt", header=T)
attach(toxicity)

Survivals <- SampSize - Deaths
y <- cbind(Deaths, Survivals)

model.1 <- glm(y ~ Dose, family="binomial")
summary(model.1)
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept) -2.64367    0.15610  -16.93   <2e-16 ***
# Dose         0.67399    0.03911   17.23   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
# Null deviance: 383.0695  on 5  degrees of freedom
# Residual deviance:   1.4491  on 4  degrees of freedom
# AIC: 39.358

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

exp(coef(model.1)[2]) # odds ratio = 1.962056
exp(confint.default(model.1)[2,]) # 95% CI = (1.817279, 2.118366)

cbind(Dose, SampSize, Deaths, Deaths/SampSize, fitted(model.1))
#   Dose SampSize Deaths                
# 1    1      250     28 0.112 0.1224230
# 2    2      250     53 0.212 0.2148914
# 3    3      250     93 0.372 0.3493957
# 4    4      250    126 0.504 0.5130710
# 5    5      250    172 0.688 0.6739903
# 6    6      250    197 0.788 0.8022286

plot(x=Dose, y=Deaths/SampSize,
     panel.last = lines(sort(Dose), fitted(model.1)[order(Dose)]))

detach(toxicity)

Poisson example (Poisson regression)

  • Load the poisson data.
  • Create a scatterplot of the data.
  • Fit a Poisson regression model of y vs x.
  • Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
  • Create a sctterplot of y vs x and add a fitted line based on the Poisson regression model.
  • Conduct a likelihood ratio (or deviance) test for x.
  • Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals and calculate p-values based on chi-squared goodness-of-fit tests.
  • Calculate pseudo \(R^2\) for Poisson regression.
  • Create residual plots using Pearson and deviance residuals.
  • Calculate hat values (leverages) and studentized residuals.

poisson <- read.table("~/path-to-data/poisson_simulated.txt", header=T)
attach(poisson)

plot(x=x, y=y)

model.1 <- glm(y ~ x, family="poisson")
summary(model.1)
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  0.30787    0.28943   1.064    0.287    
# x            0.07636    0.01730   4.413 1.02e-05 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for poisson family taken to be 1)
# 
# Null deviance: 48.310  on 29  degrees of freedom
# Residual deviance: 27.842  on 28  degrees of freedom
# AIC: 124.5

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

plot(x=x, y=y,
     panel.last = lines(sort(x), fitted(model.1)[order(x)]))

anova(model.1, test="Chisq")
#      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
# NULL                    29     48.310              
# x     1   20.468        28     27.842 6.065e-06 ***

sum(residuals(model.1, type="deviance")^2) # 27.84209
model.1$deviance # 27.84209
pchisq(model.1$deviance, 28, lower.tail=F) # p-value = 0.4728389

sum(residuals(model.1, type="pearson")^2) # 26.09324
pchisq(sum(residuals(model.1, type="pearson")^2), 28, lower.tail=F) # p-value = 0.5679192

1-model.1$deviance/model.1$null.deviance # Pseudo R-squared = 0.423676

plot(fitted(model.1), residuals(model.1, type="pearson"))
plot(fitted(model.1), residuals(model.1, type="deviance"))

summary(influence.measures(model.1))
#    dfb.1_ dfb.x dffit cov.r   cook.d hat    
# 10 -0.22   0.30  0.37  1.25_*  0.08   0.18  
# 21  0.37  -0.48 -0.57  1.30_*  0.15   0.23_*

residuals(model.1)[8] # 1.974329
rstudent(model.1)[8] # 2.028255

detach(poisson)

Hospital recovery (exponential regression)

  • Load the recovery data.
  • Create log(prog) variable.
  • Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
  • Fit nonlinear regression model to data using these starting values.
  • Create a scatterplot of prog vs days and add a fitted line based on the nonlinear regression model.

recovery <- read.table("~/path-to-data/recovery.txt", header=T)
attach(recovery)

logprog <- log(prog)
summary(lm(logprog ~ days))
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  4.037159   0.084103   48.00 5.08e-16 ***
# days        -0.037974   0.002284  -16.62 3.86e-10 ***

exp(4.037159) # 56.66513

model.1 <- nls(prog ~ theta1 * exp(theta2 * days),
               start=list(theta1=56.7, theta2=-0.038))
summary(model.1)
#         Estimate Std. Error t value Pr(>|t|)    
# theta1 58.606532   1.472159   39.81 5.70e-15 ***
# theta2 -0.039586   0.001711  -23.13 6.01e-12 ***
# ---
# Residual standard error: 1.951 on 13 degrees of freedom

plot(x=days, y=prog,
     panel.last = lines(sort(days), fitted(model.1)[order(days)]))

detach(recovery)

U.S. census population (population growth nonlinear regression)

  • Load the census data.
  • Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
  • Fit nonlinear regression model to data using these starting values.
  • Create a scatterplot of population vs year and add a fitted line based on the nonlinear regression model.
  • Create a residual plot.

census <- read.table("~/path-to-data/us_census.txt", header=T)
attach(census)

plot(x=year, y=population)

log(350/3.929-1) # 4.478259
log(350/5.308-1) - log(350/3.929-1) # -0.3048229

model.1 <- nls(population ~ beta1 / (1 + exp(beta2 + beta3 * (year - 1790) / 10)),
               start=list(beta1=350, beta2=4.5, beta3=-0.3))
summary(model.1)
#        Estimate Std. Error t value Pr(>|t|)    
# beta1 389.16551   30.81196   12.63  2.2e-10 ***
# beta2   3.99035    0.07032   56.74  < 2e-16 ***
# beta3  -0.22662    0.01086  -20.87  4.6e-14 ***
# ---
# Residual standard error: 4.45 on 18 degrees of freedom

plot(x=year, y=population,
     panel.last = lines(sort(year), fitted(model.1)[order(year)]))

plot(x=year, y=residuals(model.1),
     panel.last = abline(h=0, lty=2))

detach(census)

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility