R Help 12: Logistic, Poisson & Nonlinear Regression

Leukemia remission (logistic regression)

Load the leukemia data.
Fit a logistic regression model of REMISS vs CELL + SMEAR + INFIL + LI + BLAST + TEMP.
Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
Fit a logistic regression model of REMISS vs LI.
Create a sctterplot of REMISS vs LI and add a fitted line based on the logistic regression model.
Calculate the odds ratio for LI and a 95% confidence interval.
Conduct a likelihood ratio (or deviance) test for LI.
Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals.
Use the hoslem.test function in the ResourceSelection package to conduct the Hosmer-Lemeshow goodness-of-fit test.
Calculate a version of R² for logistic regression.
Create residual plots using Pearson and deviance residuals.
Calculate hat values (leverages), studentized residuals, and Cook's distances.

leukemia <- read.table("~/path-to-data/leukemia_remission.txt", header=T)
attach(leukemia)

model.1 <- glm(REMISS ~ CELL + SMEAR + INFIL + LI + BLAST + TEMP, family="binomial")
summary(model.1)
#               Estimate Std. Error z value Pr(>|z|)
# (Intercept)   64.25808   74.96480   0.857    0.391
# CELL          30.83006   52.13520   0.591    0.554
# SMEAR         24.68632   61.52601   0.401    0.688
# INFIL        -24.97447   65.28088  -0.383    0.702
# LI             4.36045    2.65798   1.641    0.101
# BLAST         -0.01153    2.26634  -0.005    0.996
# TEMP        -100.17340   77.75289  -1.288    0.198
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
# Null deviance: 34.372  on 26  degrees of freedom
# Residual deviance: 21.594  on 20  degrees of freedom
# AIC: 35.594

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

model.2 <- glm(REMISS ~ LI, family="binomial")
summary(model.2)
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)   -3.777      1.379  -2.740  0.00615 **
# LI             2.897      1.187   2.441  0.01464 * 
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
# Null deviance: 34.372  on 26  degrees of freedom
# Residual deviance: 26.073  on 25  degrees of freedom
# AIC: 30.073

plot(x=LI, y=REMISS,
     panel.last = lines(sort(LI), fitted(model.2)[order(LI)]))

exp(coef(model.2)[2]) # odds ratio = 18.12449
exp(confint.default(model.2)[2,]) # 95% CI = (1.770284, 185.561725)

anova(model.2, test="Chisq")
#      Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
# NULL                    26     34.372            
# LI    1   8.2988        25     26.073 0.003967 **

sum(residuals(model.2, type="deviance")^2) # 26.07296
model.2$deviance # 26.07296
sum(residuals(model.2, type="pearson")^2) # 23.93298

library(ResourceSelection)
hoslem.test(model.2$y, fitted(model.2), g=9)
# Hosmer and Lemeshow goodness of fit (GOF) test
# data:  REMISS, fitted(model.2)
# X-squared = 7.3293, df = 7, p-value = 0.3954

1-model.2$deviance/model.2$null.deviance # "R-squared" = 0.2414424

plot(1:27, residuals(model.2, type="pearson"), type="b")
plot(1:27, residuals(model.2, type="deviance"), type="b")

summary(influence.measures(model.2))
#  dfb.1_ dfb.LI dffit   cov.r cook.d hat  
# 8  0.63  -0.83  -0.93_*  0.88  0.58   0.15

hatvalues(model.2)[8] # 0.1498395
residuals(model.2)[8] # -1.944852
rstudent(model.2)[8] # -2.185013
cooks.distance(model.2)[8] # 0.5833219

detach(leukemia)

Disease outbreak (logistic regression)

Load the disease outbreak data.
Create interaction variables.
Fit "full" logistic regression model of Disease vs four predictors and five interactions.
Fit "reduced" logistic regression model of Disease vs four predictors.
Conduct a likelihood ratio (or deviance) test for the five interactions.
Display the analysis of deviance table with sequential deviances.

disease <- read.table("~/path-to-data/DiseaseOutbreak.txt", header=T)
attach(disease)

Age.Middle <- Age*Middle
Age.Lower <- Age*Lower
Age.Sector <- Age*Sector
Middle.Sector <- Middle*Sector
Lower.Sector <- Lower*Sector

model.1 <- glm(Disease ~ Age + Middle + Lower + Sector + Age.Middle + Age.Lower +
                 Age.Sector + Middle.Sector + Lower.Sector, family="binomial")
model.2 <- glm(Disease ~ Age + Middle + Lower + Sector, family="binomial")
anova(model.2, model.1, test="Chisq")
#   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1        93    101.054                     
# 2        88     93.996  5   7.0583   0.2163

anova(model.1, test="Chisq")
#               Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
# NULL                             97    122.318            
# Age            1   7.4050        96    114.913 0.006504 **
# Middle         1   1.8040        95    113.109 0.179230   
# Lower          1   1.6064        94    111.502 0.205003   
# Sector         1  10.4481        93    101.054 0.001228 **
# Age.Middle     1   4.5697        92     96.484 0.032542 * 
# Age.Lower      1   1.0152        91     95.469 0.313666   
# Age.Sector     1   1.1202        90     94.349 0.289878   
# Middle.Sector  1   0.0001        89     94.349 0.993427   
# Lower.Sector   1   0.3531        88     93.996 0.552339   

detach(disease)

Toxicity and insects (logistic regression using event/trial data format)

Load the toxicity data.
Create a Survivals variable and a matrix with Deaths in one column and Survivals in the other column.
Fit a logistic regression model of Deaths vs Dose.
Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
Calculate the odds ratio for Dose and a 95% confidence interval.
Display the observed and fitted probabilities.
Create a sctterplot of observed probabilties vs Dose and add a fitted line based on the logistic regression model.

toxicity <- read.table("~/path-to-data/toxicity.txt", header=T)
attach(toxicity)

Survivals <- SampSize - Deaths
y <- cbind(Deaths, Survivals)

model.1 <- glm(y ~ Dose, family="binomial")
summary(model.1)
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept) -2.64367    0.15610  -16.93   <2e-16 ***
# Dose         0.67399    0.03911   17.23   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
# Null deviance: 383.0695  on 5  degrees of freedom
# Residual deviance:   1.4491  on 4  degrees of freedom
# AIC: 39.358

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

exp(coef(model.1)[2]) # odds ratio = 1.962056
exp(confint.default(model.1)[2,]) # 95% CI = (1.817279, 2.118366)

cbind(Dose, SampSize, Deaths, Deaths/SampSize, fitted(model.1))
#   Dose SampSize Deaths                
# 1    1      250     28 0.112 0.1224230
# 2    2      250     53 0.212 0.2148914
# 3    3      250     93 0.372 0.3493957
# 4    4      250    126 0.504 0.5130710
# 5    5      250    172 0.688 0.6739903
# 6    6      250    197 0.788 0.8022286

plot(x=Dose, y=Deaths/SampSize,
     panel.last = lines(sort(Dose), fitted(model.1)[order(Dose)]))

detach(toxicity)

Poisson example (Poisson regression)

Load the poisson data.
Create a scatterplot of the data.
Fit a Poisson regression model of y vs x.
Calculate 95% confidence intervals for the regression parameters based on asymptotic normality and based on profiling the least-squares estimation surface.
Create a sctterplot of y vs x and add a fitted line based on the Poisson regression model.
Conduct a likelihood ratio (or deviance) test for x.
Calculate the sum of squared deviance residuals and the sum of squared Pearson residuals and calculate p-values based on chi-squared goodness-of-fit tests.
Calculate pseudo R² for Poisson regression.
Create residual plots using Pearson and deviance residuals.
Calculate hat values (leverages) and studentized residuals.

poisson <- read.table("~/path-to-data/poisson_simulated.txt", header=T)
attach(poisson)

plot(x=x, y=y)

model.1 <- glm(y ~ x, family="poisson")
summary(model.1)
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  0.30787    0.28943   1.064    0.287    
# x            0.07636    0.01730   4.413 1.02e-05 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for poisson family taken to be 1)
# 
# Null deviance: 48.310  on 29  degrees of freedom
# Residual deviance: 27.842  on 28  degrees of freedom
# AIC: 124.5

confint.default(model.1) # based on asymptotic normality
confint(model.1) # based on profiling the least-squares estimation surface

plot(x=x, y=y,
     panel.last = lines(sort(x), fitted(model.1)[order(x)]))

anova(model.1, test="Chisq")
#      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
# NULL                    29     48.310              
# x     1   20.468        28     27.842 6.065e-06 ***

sum(residuals(model.1, type="deviance")^2) # 27.84209
model.1$deviance # 27.84209
pchisq(model.1$deviance, 28, lower.tail=F) # p-value = 0.4728389

sum(residuals(model.1, type="pearson")^2) # 26.09324
pchisq(sum(residuals(model.1, type="pearson")^2), 28, lower.tail=F) # p-value = 0.5679192

1-model.1$deviance/model.1$null.deviance # Pseudo R-squared = 0.423676

plot(fitted(model.1), residuals(model.1, type="pearson"))
plot(fitted(model.1), residuals(model.1, type="deviance"))

summary(influence.measures(model.1))
#    dfb.1_ dfb.x dffit cov.r   cook.d hat    
# 10 -0.22   0.30  0.37  1.25_*  0.08   0.18  
# 21  0.37  -0.48 -0.57  1.30_*  0.15   0.23_*

residuals(model.1)[8] # 1.974329
rstudent(model.1)[8] # 2.028255

detach(poisson)

Hospital recovery (exponential regression)

Load the recovery data.
Create log(prog) variable.
Obtain starting values for nonlinear model parameters from fitting a simple linear regression model of log(prog) vs days.
Fit nonlinear regression model to data using these starting values.
Create a scatterplot of prog vs days and add a fitted line based on the nonlinear regression model.

recovery <- read.table("~/path-to-data/recovery.txt", header=T)
attach(recovery)

logprog <- log(prog)
summary(lm(logprog ~ days))
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  4.037159   0.084103   48.00 5.08e-16 ***
# days        -0.037974   0.002284  -16.62 3.86e-10 ***

exp(4.037159) # 56.66513

model.1 <- nls(prog ~ theta1 * exp(theta2 * days),
               start=list(theta1=56.7, theta2=-0.038))
summary(model.1)
#         Estimate Std. Error t value Pr(>|t|)    
# theta1 58.606532   1.472159   39.81 5.70e-15 ***
# theta2 -0.039586   0.001711  -23.13 6.01e-12 ***
# ---
# Residual standard error: 1.951 on 13 degrees of freedom

plot(x=days, y=prog,
     panel.last = lines(sort(days), fitted(model.1)[order(days)]))

detach(recovery)

U.S. census population (population growth nonlinear regression)

Load the census data.
Obtain starting values for nonlinear model parameters from observing features of a scatterplot of population vs year.
Fit nonlinear regression model to data using these starting values.
Create a scatterplot of population vs year and add a fitted line based on the nonlinear regression model.
Create a residual plot.

census <- read.table("~/path-to-data/us_census.txt", header=T)
attach(census)

plot(x=year, y=population)

log(350/3.929-1) # 4.478259
log(350/5.308-1) - log(350/3.929-1) # -0.3048229

model.1 <- nls(population ~ beta1 / (1 + exp(beta2 + beta3 * (year - 1790) / 10)),
               start=list(beta1=350, beta2=4.5, beta3=-0.3))
summary(model.1)
#        Estimate Std. Error t value Pr(>|t|)    
# beta1 389.16551   30.81196   12.63  2.2e-10 ***
# beta2   3.99035    0.07032   56.74  < 2e-16 ***
# beta3  -0.22662    0.01086  -20.87  4.6e-14 ***
# ---
# Residual standard error: 4.45 on 18 degrees of freedom

plot(x=year, y=population,
     panel.last = lines(sort(year), fitted(model.1)[order(year)]))

plot(x=year, y=residuals(model.1),
     panel.last = abline(h=0, lty=2))

detach(census)

R Help 12: Logistic, Poisson & Nonlinear Regression

Leukemia remission (logistic regression)

Disease outbreak (logistic regression)

Toxicity and insects (logistic regression using event/trial data format)

Poisson example (Poisson regression)

Hospital recovery (exponential regression)

U.S. census population (population growth nonlinear regression)

Navigation

Start Here!

Lessons

Resources