Word recall (log-transforming a predictor)
- Load the wordrecall data.
- Fit a simple linear regression model of prop on time.
- Display a scatterplot of the data and add the regression line.
- Display a residual plot with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot. The argument "datax" determines which way round to plot the axes (false by default, which plots the data on the vertical axis, or true, which plots the data on the horizontal axis).
- Load the
nortest
package to access the Anderson-Darling normality test.
- Create a log(time) variable and fit a simple linear regression model of prop on log(time).
- Repeat diagnostic plots and normality tests.
- Create a prop^-1.25 variable and fit a simple linear regression model of prop^-1.25 on time.
- Repeat diagnostic plots and normality tests.
- Use prop on log(time) model to find:
- 95% prediction interval for a prop at time 1000.
- 95% confidence interval for the expected change in prop for a 10-fold increase in time.
wordrecall <- read.table("~/path-to-folder/wordrecall.txt", header=T)
attach(wordrecall)
model.1 <- lm(prop ~ time)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 5.259e-01 4.881e-02 10.774 3.49e-07 ***
# time -5.571e-05 1.457e-05 -3.825 0.00282 **
# Multiple R-squared: 0.5709, Adjusted R-squared: 0.5318
plot(x=time, y=prop, ylim=c(-0.1, 0.9),
panel.last = lines(sort(time), fitted(model.1)[order(time)]))
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.1), main="", datax=TRUE)
qqline(residuals(model.1), datax=TRUE)
library(nortest)
ad.test(residuals(model.1)) # A = 0.262, p-value = 0.6426
lntime <- log(time)
model.2 <- lm(prop ~ lntime)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.846415 0.014195 59.63 3.65e-15 ***
# lntime -0.079227 0.002416 -32.80 2.53e-12 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.02339 on 11 degrees of freedom
# Multiple R-squared: 0.9899, Adjusted R-squared: 0.989
# F-statistic: 1076 on 1 and 11 DF, p-value: 2.525e-12
plot(x=lntime, y=prop,
panel.last = lines(sort(lntime), fitted(model.2)[order(lntime)]))
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 0.3216, p-value = 0.4869
prop1.25 <- prop^-1.25
model.3 <- lm(prop1.25 ~ time)
summary(model.3)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.8693698 0.3869678 4.831 0.000527 ***
# time 0.0019708 0.0001155 17.067 2.91e-09 ***
# Multiple R-squared: 0.9636, Adjusted R-squared: 0.9603
plot(x=time, y=prop1.25,
panel.last = lines(sort(time), fitted(model.3)[order(time)]))
plot(x=fitted(model.3), y=residuals(model.3),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.3), main="", datax=TRUE)
qqline(residuals(model.3), datax=TRUE)
ad.test(residuals(model.3)) # A = 1.191, p-value = 0.002584
predict(model.2, interval="prediction",
newdata=data.frame(lntime=log(1000)))
# fit lwr upr
# 1 0.2991353 0.2449729 0.3532978
confint(model.2)[2,]*log(10) # 95% CI for 10-fold increase in time
# 2.5 % 97.5 %
# -0.1946689 -0.1701845
detach(wordrecall)
Mammal gestation (log-transforming the response)
- Load the mammgest data.
- Fit a simple linear regression model of Gestation on Birthwgt.
- Display a scatterplot of the data and add the regression line.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot.
- Apply the Anderson-Darling normality test using the
nortest
package.
- Create a log(Gestation) variable and fit a simple linear regression model of log(Gestation) on Birthwgt.
- Repeat diagnostic plots and normality tests.
- Use log(Gestation) on Birthwgt model to find:
- 95% prediction interval for Gestation for a Birthwgt of 50.
- 95% confidence interval for proportional change in median Gestation for a 1-pound increase in Birthwgt.
- 95% confidence interval for proportional change in median Gestation for a 10-pound increase in Birthwgt.
mammgest <- read.table("~/path-to-folder/mammgest.txt", header=T)
attach(mammgest)
model.1 <- lm(Gestation ~ Birthwgt)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 187.0837 26.9426 6.944 6.73e-05 ***
# Birthwgt 3.5914 0.5247 6.844 7.52e-05 ***
# Multiple R-squared: 0.8388, Adjusted R-squared: 0.8209
plot(x=Birthwgt, y=Gestation,
panel.last = lines(sort(Birthwgt), fitted(model.1)[order(Birthwgt)]))
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.1), main="", datax=TRUE)
qqline(residuals(model.1), datax=TRUE)
ad.test(residuals(model.1)) # A = 0.3116, p-value = 0.503
lnGest <- log(Gestation)
model.2 <- lm(lnGest ~ Birthwgt)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 5.278817 0.088177 59.866 5.1e-13 ***
# Birthwgt 0.010410 0.001717 6.062 0.000188 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2163 on 9 degrees of freedom
# Multiple R-squared: 0.8033, Adjusted R-squared: 0.7814
# F-statistic: 36.75 on 1 and 9 DF, p-value: 0.0001878
plot(x=Birthwgt, y=lnGest,
panel.last = lines(sort(Birthwgt), fitted(model.2)[order(Birthwgt)]))
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 0.3135, p-value = 0.4963
exp(predict(model.2, interval="prediction",
newdata=data.frame(Birthwgt=50)))
# fit lwr upr
# 1 330.0781 197.3013 552.2092
# proportional change in median gestation for 1-unit increase in birthwgt
exp(coefficients(model.2)[2]) # 1.010465
exp(confint(model.2)[2,]) # 95% CI
# 2.5 % 97.5 %
# 1.006547 1.014398
# proportional change in median gestation for 10-unit increase in birthwgt
exp(10*coefficients(model.2)[2]) # 1.109714
exp(10*confint(model.2)[2,]) # 95% CI
# 2.5 % 97.5 %
# 1.067429 1.153674
detach(mammgest)
Shortleaf pine trees (log-transforming both response and predictor)
- Load the shortleaf data.
- Fit a simple linear regression model of Vol on Diam.
- Display a scatterplot of the data and add the regression line.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot.
- Apply the Anderson-Darling normality test using the
nortest
package.
- Create a log(Diam) variable and fit a simple linear regression model of Vol on log(Diam).
- Repeat diagnostic plots and normality tests.
- Create a log(Vol) variable and fit a simple linear regression model of log(Vol) on log(Diam).
- Repeat diagnostic plots and normality tests.
- Use log(Vol) on log(Diam) model to find:
- 95% confidence interval for median Vol for a Diam of 10.
- 95% confidence interval for proportional change in median Vol for a 2-fold increase in Diam.
shortleaf <- read.table("~/path-to-folder/shortleaf.txt", header=T)
attach(shortleaf)
model.1 <- lm(Vol ~ Diam)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -41.5681 3.4269 -12.13 <2e-16 ***
# Diam 6.8367 0.2877 23.77 <2e-16 ***
# ---
# Multiple R-squared: 0.8926, Adjusted R-squared: 0.891
plot(x=Diam, y=Vol,
panel.last = lines(sort(Diam), fitted(model.1)[order(Diam)]))
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.1), main="", datax=TRUE)
qqline(residuals(model.1), datax=TRUE)
ad.test(residuals(model.1)) # A = 0.9913, p-value = 0.01215
lnDiam <- log(Diam)
model.2 <- lm(Vol ~ lnDiam)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -116.162 10.830 -10.73 2.88e-16 ***
# lnDiam 64.536 4.562 14.15 < 2e-16 ***
# Multiple R-squared: 0.7464, Adjusted R-squared: 0.7427
plot(x=lnDiam, y=Vol,
panel.last = lines(sort(lnDiam), fitted(model.2)[order(lnDiam)]))
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 2.3845, p-value = 4.273e-06
lnVol <- log(Vol)
model.3 <- lm(lnVol ~ lnDiam)
summary(model.3)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -2.8718 0.1216 -23.63 <2e-16 ***
# lnDiam 2.5644 0.0512 50.09 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1703 on 68 degrees of freedom
# Multiple R-squared: 0.9736, Adjusted R-squared: 0.9732
# F-statistic: 2509 on 1 and 68 DF, p-value: < 2.2e-16
plot(x=lnDiam, y=lnVol,
panel.last = lines(sort(lnDiam), fitted(model.3)[order(lnDiam)]))
plot(x=fitted(model.3), y=residuals(model.3),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.3), main="", datax=TRUE)
qqline(residuals(model.3), datax=TRUE)
ad.test(residuals(model.3)) # A = 0.5309, p-value = 0.1692
exp(predict(model.3, interval="confidence",
newdata=data.frame(lnDiam=log(10))))
# fit lwr upr
# 1 20.75934 19.92952 21.62372
# proportional change in median Vol for 2-fold increase in Diam
2^(coefficients(model.3)[2]) # 5.915155
2^(confint(model.3)[2,]) # 95% CI
# 2.5 % 97.5 %
# 5.510776 6.349207
detach(shortleaf)
Underground air quality (interactions)
- Load the swallows data.
- Load the
car
package to access 3D scatterplots.
- Create interaction variables and fit a multiple linear regression model of Vent on O2 + CO2 + Type + TypeO2 + TypeCO2 + CO2O2.
- Use
anova
function to display the ANOVA table with sequential (type I) sums of squares.
- Calculate partial F-statistic and p-value.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Fit a multiple linear regression model of Vent on O2 + CO2 + Type.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot.
- Apply the Anderson-Darling normality test using the
nortest
package.
swallows <- read.table("~/path-to-folder/allswallows.txt", header=T)
attach(swallows)
library(car)
scatter3d(Vent ~ O2 + CO2, subset=Type==1) # adult
scatter3d(Vent ~ O2 + CO2, subset=Type==0) # nestling
scatter3d(Vent ~ O2 + CO2, subset=Type==0, revolutions=3, speed=0.5, grid=F)
TypeO2 <- Type*O2
TypeCO2 <- Type*CO2
CO2O2 <- CO2*O2
model.1 <- lm(Vent ~ O2 + CO2 + Type + TypeO2 + TypeCO2 + CO2O2)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -18.399 160.007 -0.115 0.9086
# O2 1.189 9.854 0.121 0.9041
# CO2 54.281 25.987 2.089 0.0378 *
# Type 111.658 157.742 0.708 0.4797
# TypeO2 -7.008 9.560 -0.733 0.4642
# TypeCO2 2.311 7.126 0.324 0.7460
# CO2O2 -1.449 1.593 -0.909 0.3642
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 165.6 on 233 degrees of freedom
# Multiple R-squared: 0.272, Adjusted R-squared: 0.2533
# F-statistic: 14.51 on 6 and 233 DF, p-value: 4.642e-14
anova(model.1) # Sequential (type I) SS
# Df Sum Sq Mean Sq F value Pr(>F)
# O2 1 93651 93651 3.4156 0.06585 .
# CO2 1 2247696 2247696 81.9762 < 2e-16 ***
# Type 1 5910 5910 0.2156 0.64288
# TypeO2 1 14735 14735 0.5374 0.46425
# TypeCO2 1 2884 2884 0.1052 0.74598
# CO2O2 1 22664 22664 0.8266 0.36421
# Residuals 233 6388603 27419
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
((14735+2884+22664)/3)/27419 # F-stat = 0.4897212
pf(0.49, 3, 233, lower.tail=F) # p-value = 0.6895548
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
model.2 <- lm(Vent ~ O2 + CO2 + Type)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 136.767 79.334 1.724 0.086 .
# O2 -8.834 4.765 -1.854 0.065 .
# CO2 32.258 3.551 9.084 <2e-16 ***
# Type 9.925 21.308 0.466 0.642
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 165 on 236 degrees of freedom
# Multiple R-squared: 0.2675, Adjusted R-squared: 0.2581
# F-statistic: 28.72 on 3 and 236 DF, p-value: 7.219e-16
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 0.3175, p-value = 0.5358
detach(swallows)
Bluegill fish (polynomial regression)
- Load the bluegills data.
- Display a scatterplot of the data.
- Create an age-squared variable and fit a multiple linear regression model of length on age + agesq.
- Add a quadratic regression line to the scatterplot.
- Find a 95% prediction interval for length for an age of 5.
bluegills <- read.table("~/path-to-folder/bluegills.txt", header=T)
attach(bluegills)
plot(x=age, y=length)
agesq <- age^2
model <- lm(length ~ age + agesq)
summary(model)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 13.622 11.016 1.237 0.22
# age 54.049 6.489 8.330 2.81e-12 ***
# agesq -4.719 0.944 -4.999 3.67e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 10.91 on 75 degrees of freedom
# Multiple R-squared: 0.8011, Adjusted R-squared: 0.7958
# F-statistic: 151.1 on 2 and 75 DF, p-value: < 2.2e-16
newX <- seq(min(age), max(age), length=100)
newXsq <- newX**2
plot(x=age, y=length,
panel.last = lines(newX,
predict(model,
newdata=data.frame(age=newX, agesq=newXsq))))
predict(model, interval="prediction",
newdata=data.frame(age=5, agesq=25))
# fit lwr upr
# 1 165.9023 143.487 188.3177
detach(bluegills)
Experiment yield (polynomial regression)
- Load the yield data.
- Fit a simple linear regression model of Yield on Temp.
- Display a scatterplot of the data and add the simple linear regression line.
- Create a Temp-squared variable and fit a multiple linear regression model of Yield on Temp + Tempsq.
- Display a scatterplot of the data and add a quadratic regression line to the scatterplot.
yield <- read.table("~/path-to-folder/yield.txt", header=T)
attach(yield)
model.1 <- lm(Yield ~ Temp)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 2.306306 0.469075 4.917 0.000282 ***
# Temp 0.006757 0.005873 1.151 0.270641
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.3913 on 13 degrees of freedom
# Multiple R-squared: 0.09242, Adjusted R-squared: 0.0226
# F-statistic: 1.324 on 1 and 13 DF, p-value: 0.2706
plot(x=Temp, y=Yield,
panel.last = lines(sort(Temp), fitted(model.1)[order(Temp)]))
Tempsq <- Temp^2
model.2 <- lm(Yield ~ Temp + Tempsq)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 7.9604811 1.2589183 6.323 3.81e-05 ***
# Temp -0.1537113 0.0349408 -4.399 0.000867 ***
# Tempsq 0.0010756 0.0002329 4.618 0.000592 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2444 on 12 degrees of freedom
# Multiple R-squared: 0.6732, Adjusted R-squared: 0.6187
# F-statistic: 12.36 on 2 and 12 DF, p-value: 0.001218
newX <- seq(min(Temp), max(Temp), length=100)
newXsq <- newX**2
plot(x=Temp, y=Yield,
panel.last = lines(newX,
predict(model.2,
newdata=data.frame(Temp=newX, Tempsq=newXsq))))
detach(yield)
Chemical odor (polynomial regression)
- Load the odor data.
- Create squared variables and fit a multiple linear regression model of Odor on Temp + Ratio + Height + Tempsq + Ratiosq + Heightsq.
- Fit a multiple linear regression model of Odor on Temp + Ratio + Height + Tempsq + Ratiosq.
odor <- read.table("~/path-to-folder/odor.txt", header=T)
attach(odor)
Tempsq <- Temp^2
Ratiosq <- Ratio^2
Heightsq <- Height^2
model.1 <- lm(Odor ~ Temp + Ratio + Height + Tempsq + Ratiosq + Heightsq)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -30.667 10.840 -2.829 0.0222 *
# Temp -12.125 6.638 -1.827 0.1052
# Ratio -17.000 6.638 -2.561 0.0336 *
# Height -21.375 6.638 -3.220 0.0122 *
# Tempsq 32.083 9.771 3.284 0.0111 *
# Ratiosq 47.833 9.771 4.896 0.0012 **
# Heightsq 6.083 9.771 0.623 0.5509
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 18.77 on 8 degrees of freedom
# Multiple R-squared: 0.8683, Adjusted R-squared: 0.7695
# F-statistic: 8.789 on 6 and 8 DF, p-value: 0.003616
model.2 <- lm(Odor ~ Temp + Ratio + Height + Tempsq + Ratiosq)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -26.923 8.707 -3.092 0.012884 *
# Temp -12.125 6.408 -1.892 0.091024 .
# Ratio -17.000 6.408 -2.653 0.026350 *
# Height -21.375 6.408 -3.336 0.008720 **
# Tempsq 31.615 9.404 3.362 0.008366 **
# Ratiosq 47.365 9.404 5.036 0.000703 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 18.12 on 9 degrees of freedom
# Multiple R-squared: 0.8619, Adjusted R-squared: 0.7852
# F-statistic: 11.23 on 5 and 9 DF, p-value: 0.001169
detach(odor)