R Help 4: SLR Assumptions, Estimation & Prediction

Alcohol consumption and muscle strength

  • Load the alcoholarm data.
  • Fit a simple linear regression model with y = strength and x = alcohol.
  • Display model results.
  • Display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot with fitted values on the horizontal axis.
  • Display a residual plot with x = alcohol on the horizontal axis.
alcoholarm <- read.table("~/path-to-folder/alcoholarm.txt", header=T)
attach(alcoholarm)

model <- lm(strength ~ alcohol)
summary(model)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 26.36954 1.20273 21.925 < 2e-16 ***
# alcohol -0.29587 0.05105 -5.796 5.14e-07 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 3.874 on 48 degrees of freedom
# Multiple R-squared: 0.4117, Adjusted R-squared: 0.3994
# F-statistic: 33.59 on 1 and 48 DF, p-value: 5.136e-07

plot(x=alcohol, y=strength,
xlab="Lifetime consumption of alcohol", ylab="Deltoid muscle strength",
panel.last = lines(sort(alcohol), fitted(model)[order(alcohol)]))

plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

plot(x=alcohol, y=residuals(model),
xlab="Lifetime consumption of alcohol", ylab="Residuals",
panel.last = abline(h=0, lty=2))

detach(alcoholarm)

Blood pressure

  • Load the bloodpress data.
  • Fit a simple linear regression model with y = BP and x = Age, display model results, and display a scatterplot of the data with the simple linear regression line.
  • Fit a simple linear regression model with y = BP and x = Weight, display model results, and display a scatterplot of the data with the simple linear regression line.
  • Fit a simple linear regression model with y = BP and x = Duration, display model results, and display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot for the model using x = Age with Weight on the horizontal axis.
  • Fit a multiple linear regression model with y = BP, x1 = Age, and x2 = Weight.
  • Display a residual plot for the model using x1 = Age and x2 = Weight with Duration on the horizontal axis.
bloodpress <- read.table("~/path-to-folder/bloodpress.txt", header=T)
attach(bloodpress)

model.1 <- lm(BP ~ Age)
summary(model.1)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 44.4545 18.7277 2.374 0.02894 *
# Age 1.4310 0.3849 3.718 0.00157 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 4.195 on 18 degrees of freedom
# Multiple R-squared: 0.4344, Adjusted R-squared: 0.403
# F-statistic: 13.82 on 1 and 18 DF, p-value: 0.001574
plot(x=Age, y=BP,
xlab="Age (years)", ylab="Diastolic blood pressure (mm Hg)",
panel.last = lines(sort(Age), fitted(model.1)[order(Age)]))

model.2 <- lm(BP ~ Weight)
summary(model.2)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 2.20531 8.66333 0.255 0.802
# Weight 1.20093 0.09297 12.917 1.53e-10 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.74 on 18 degrees of freedom
# Multiple R-squared: 0.9026, Adjusted R-squared: 0.8972
# F-statistic: 166.9 on 1 and 18 DF, p-value: 1.528e-10
plot(x=Weight, y=BP,
xlab="Weight (pounds)", ylab="Diastolic blood pressure (mm Hg)",
panel.last = lines(sort(Weight), fitted(model.2)[order(Weight)]))

model.3 <- lm(BP ~ Dur)
summary(model.3)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 109.2350 3.8563 28.327 <2e-16 ***
# Dur 0.7411 0.5703 1.299 0.21
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 5.333 on 18 degrees of freedom
# Multiple R-squared: 0.08575, Adjusted R-squared: 0.03496
# F-statistic: 1.688 on 1 and 18 DF, p-value: 0.2102
plot(x=Dur, y=BP,
xlab="Duration of hypertension (years)",
ylab="Diastolic blood pressure (mm Hg)",
panel.last = lines(sort(Dur), fitted(model.3)[order(Dur)]))

plot(x=Weight, y=residuals(model.1),
xlab="Weight (pounds)", ylab="Residuals from model with Age",
panel.last = abline(h=0, lty=2))

model.12 <- lm(BP ~ Age + Weight)

plot(x=Dur, y=residuals(model.12),
xlab="Duration of hypertension (years)",
ylab="Residuals from model with Age and Weight",
panel.last = abline(h=0, lty=2))

detach(bloodpress)

Tread wear

  • Load the treadwear data.
  • Fit a simple linear regression model with y = groove and x = mileage.
  • Display model results.
  • Display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot with fitted values on the horizontal axis.
treadwear <- read.table("~/path-to-folder/treadwear.txt", header=T)
attach(treadwear)

model <- lm(groove ~ mileage)
summary(model)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 360.6367 11.6886 30.85 9.70e-09 ***
# mileage -7.2806 0.6138 -11.86 6.87e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 19.02 on 7 degrees of freedom
# Multiple R-squared: 0.9526, Adjusted R-squared: 0.9458
# F-statistic: 140.7 on 1 and 7 DF, p-value: 6.871e-06

plot(x=mileage, y=groove,
xlab="Mileage (1000s of miles)", ylab="Depth of groove (mils)",
panel.last = lines(sort(mileage), fitted(model)[order(mileage)]))

plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

detach(treadwear)

Plutonium

  • Load the alphapluto data.
  • Fit a simple linear regression model with y = alpha and x = pluto.
  • Display model results.
  • Display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot with fitted values on the horizontal axis.
alphapluto <- read.table("~/path-to-folder/alphapluto.txt", header=T)
attach(alphapluto)

model <- lm(alpha ~ pluto)
summary(model)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.0070331 0.0035988 1.954 0.0641 .
# pluto 0.0055370 0.0003659 15.133 9.08e-13 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.01257 on 21 degrees of freedom
# Multiple R-squared: 0.916, Adjusted R-squared: 0.912
# F-statistic: 229 on 1 and 21 DF, p-value: 9.077e-13

plot(x=pluto, y=alpha,
xlab="Plutonium activity (pCi/g)", ylab="Alpha count rate (number per second)",
panel.last = lines(sort(pluto), fitted(model)[order(pluto)]))

plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

detach(alphapluto)

Alcohol and tobacco

  • Load the alcoholtobacco data.
  • Fit a simple linear regression model with y = Alcohol and x = Tobacco.
  • Display model results.
  • Display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot with fitted values on the horizontal axis.
  • Refit the model excluding Northern Ireland.
  • Display a scatterplot of the data excluding Northern Ireland with the simple linear regression line for the model excluding Northern Ireland.
  • Display a standardized residual plot for the model fit to all the data with fitted values on the horizontal axis.
  • Calculate the standardized residual for Northern Ireland.
alcoholtobacco <- read.table("~/path-to-folder/alcoholtobacco.txt", header=T)
attach(alcoholtobacco)

model.1 <- lm(Alcohol ~ Tobacco)
summary(model.1)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 4.3512 1.6067 2.708 0.0241 *
# Tobacco 0.3019 0.4388 0.688 0.5087
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.8196 on 9 degrees of freedom
# Multiple R-squared: 0.04998, Adjusted R-squared: -0.05557
# F-statistic: 0.4735 on 1 and 9 DF, p-value: 0.5087

plot(x=Tobacco, y=Alcohol,
xlab="Ave weekly tobacco expenditure (GBP)",
ylab="Ave weekly alcohol expenditure (GBP)",
panel.last = lines(sort(Tobacco), fitted(model.1)[order(Tobacco)]))

plot(x=fitted(model.1), y=residuals(model.1),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

model.2 <- lm(Alcohol ~ Tobacco, subset=Region!="NorthernIreland")

plot(x=Tobacco[Region!="NorthernIreland"], y=Alcohol[Region!="NorthernIreland"],
xlab="Ave weekly tobacco expenditure (GBP)",
ylab="Ave weekly alcohol expenditure (GBP)",
panel.last = lines(sort(Tobacco), fitted(model.2)[order(Tobacco)]))

plot(x=fitted(model.1), y=rstandard(model.1),
xlab="Fitted values", ylab="Standardized residuals",
panel.last = abline(h=0, lty=2))

rstandard(model.1)[Region=="NorthernIreland"] # -2.575075

detach(alcoholtobacco)

Anscombe data

  • Load the anscombe data.
  • Fit a simple linear regression model with y = y3 and x = x3.
  • Display model results.
  • Display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot with fitted values on the horizontal axis.
anscombe <- read.table("~/path-to-folder/anscombe.txt", header=T)
attach(anscombe)

model <- lm(y3 ~ x3)
summary(model)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.0025 1.1245 2.670 0.02562 *
# x3 0.4997 0.1179 4.239 0.00218 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.236 on 9 degrees of freedom
# Multiple R-squared: 0.6663, Adjusted R-squared: 0.6292
# F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002176

plot(x=x3, y=y3,
panel.last = lines(sort(x3), fitted(model)[order(x3)]))

plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

detach(anscombe)

Skin cancer mortality

  • Load the skin cancer data.
  • Fit a simple linear regression model with y = Mort and x = Lat.
  • Display a scatterplot of the data with the simple linear regression line.
attach(skincancer)

model <- lm(Mort ~ Lat)

plot(x=Lat, y=Mort,
xlab="Latitude (at center of state)", ylab="Mortality (deaths per 10 million)",
main="Skin Cancer Mortality versus State Latitude",
panel.last = lines(sort(Lat), fitted(model)[order(Lat)]))

detach(skincancer)

Alligators

  • Load the alligator data.
  • Fit a simple linear regression model with y = weight and x = length.
  • Display a scatterplot of the data with the simple linear regression line.
alligator <- read.table("~/path-to-folder/alligator.txt", header=T)
attach(alligator)

model <- lm(weight ~ length)

plot(x=length, y=weight, ylim=c(-50, 650),
panel.last = lines(sort(length), fitted(model)[order(length)]))

detach(alligator)

Alloy corrosion

  • Load the corrosion data.
  • Fit a simple linear regression model with y = wgtloss and x = iron.
  • Display a scatterplot of the data with the simple linear regression line.
corrosion <- read.table("~/path-to-folder/corrosion.txt", header=T)
attach(corrosion)

model <- lm(wgtloss ~ iron)

plot(x=iron, y=wgtloss,
panel.last = lines(sort(iron), fitted(model)[order(iron)]))

detach(corrosion)

Handspan and height

  • Load the handheight data.
  • Fit a simple linear regression model with y = HandSpan and x = Height.
  • Display a residual plot with fitted values on the horizontal axis.
handheight <- read.table("~/path-to-folder/handheight.txt", header=T)
attach(handheight)

model <- lm(HandSpan ~ Height)

plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

detach(handheight)

Chemical solution concentration

  • Load the solconc data.
  • Fit a simple linear regression model with y = y (concentration) and x = x (time).
  • Display a residual plot with fitted values on the horizontal axis.
solconc <- read.table("~/path-to-folder/solutions_conc.txt", header=T)
attach(solconc)

model <- lm(y ~ x)

plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

detach(solconc)

Real estate sales

  • Load the realestate data.
  • Fit a simple linear regression model with y = SalePrice and x = Sqrfeet.
  • Display a residual plot with fitted values on the horizontal axis.
realestate <- read.table("~/path-to-folder/realestate_sales.txt", header=T)
attach(realestate)

model <- lm(SalePrice ~ SqrFeet)

plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

detach(realestate)

Old Faithful geyser eruptions

  • Load the oldfaithful data.
  • Fit a simple linear regression model with y = waiting and x = eruption.
  • Display a histogram and normal probability plot of the residuals.
oldfaithful <- read.table("~/path-to-folder/oldfaithful.txt", header=T)
attach(oldfaithful)

model <- lm(waiting ~ eruption)

hist(residuals(model), main="", breaks=12)

qqnorm(residuals(model), main="")
qqline(residuals(model))

detach(oldfaithful)

Hospital infection risk

  • Load the infectionrisk data.
  • Select only hospitals in regions 1 or 2.
  • Fit a simple linear regression model with y = InfctRsk and x = Stay.
  • Display a normal probability plot of the residuals.
infectionrisk <- read.table("~/path-to-folder/infectionrisk.txt", header=T)
infectionrisk <- infectionrisk[infectionrisk$Region==1 | infectionrisk$Region==2, ]
attach(infectionrisk)

model <- lm(InfctRsk ~ Stay)

qqnorm(residuals(model), main="")
qqline(residuals(model))

detach(infectionrisk)

Car stopping distances

  • Load the carstopping data.
  • Fit a simple linear regression model with y = StopDist and x = Speed.
  • Display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot with fitted values on the horizontal axis.
  • Create a new response variable equal to √StopDist.
  • Fit a simple linear regression model with y = √StopDist and x = Speed.
  • Display a scatterplot of the data with the simple linear regression line.
  • Display a residual plot with fitted values on the horizontal axis.
  • Use the model to predict StopDist for Speed = 10, 20, 30, and 40.
carstopping <- read.table("~/path-to-folder/carstopping.txt", header=T)
attach(carstopping)

model <- lm(StopDist ~ Speed)
plot(x=Speed, y=StopDist,
panel.last = lines(sort(Speed), fitted(model)[order(Speed)]))
plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

sqrtdist <- sqrt(StopDist)

model <- lm(sqrtdist ~ Speed)
plot(x=Speed, y=sqrtdist,
panel.last = lines(sort(Speed), fitted(model)[order(Speed)]))
plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))

predict(model, interval="prediction",
newdata=data.frame(Speed=c(10, 20, 30, 40)))^2
# fit lwr upr
# 1 11.86090 3.93973 24.03997
# 2 35.63671 20.42935 55.04771
# 3 72.17067 49.44080 99.18664
# 4 121.46277 90.63292 156.79793

detach(carstopping)

Student heights and weights

  • Load the heightweight data.
  • Fit a simple linear regression model with y = wt and x = ht.
  • Display a scatterplot of the data with the simple linear regression line and a horizontal line at the mean weight.
  • Use the model to predict weight for height = 64.
heightweight <- read.table("~/path-to-folder/student_height_weight.txt", header=T)
attach(heightweight)

model <- lm(wt ~ ht)
plot(x=ht, y=wt,
panel.last = c(lines(sort(ht), fitted(model)[order(ht)]),
abline(h=mean(wt))))
mean(wt) # 158.8
predict(model, newdata=data.frame(ht=64)) # 126.2708

detach(heightweight)

Skin cancer mortality (revisited)

  • Load the skin cancer data.
  • Fit a simple linear regression model with y = Mort and x = Lat.
  • Display a scatterplot of the data with the simple linear regression line.
  • Use the model to calculate 95% confidence intervals for E(Mort) at Lat = 40 and 28.
  • Calculate mean(Lat).
  • Use the model to calculate 95% prediction intervals for Mort at Lat = 40.
  • Display a scatterplot of the data with the simple linear regression line, confidence interval bounds, and prediction interval bounds.
skincancer <- read.table("~/path-to-folder/skincancer.txt", header=T)
attach(skincancer)

model <- lm(Mort ~ Lat)
plot(x=Lat, y=Mort,
xlab="Latitude (at center of state)", ylab="Mortality (deaths per 10 million)",
main="Skin Cancer Mortality versus State Latitude",
panel.last = lines(sort(Lat), fitted(model)[order(Lat)]))

predict(model, interval="confidence", se.fit=T,
newdata=data.frame(Lat=c(40, 28)))
# $fit
# fit lwr upr
# 1 150.0839 144.5617 155.6061
# 2 221.8156 206.8855 236.7456
#
# $se.fit
# 1 2
# 2.745000 7.421459

mean(Lat) # 39.53265

predict(model, interval="prediction",
newdata=data.frame(Lat=40))
# fit lwr upr
# 1 150.0839 111.235 188.9329

plot(x=Lat, y=Mort,
xlab="Latitude (at center of state)", ylab="Mortality (deaths per 10 million)",
ylim=c(60, 260),
panel.last = c(lines(sort(Lat), fitted(model)[order(Lat)]),
lines(sort(Lat),
predict(model,
interval="confidence")[order(Lat), 2], col="green"),
lines(sort(Lat),
predict(model,
interval="confidence")[order(Lat), 3], col="green"),
lines(sort(Lat),
predict(model,
interval="prediction")[order(Lat), 2], col="purple"),
lines(sort(Lat),
predict(model,
interval="prediction")[order(Lat), 3], col="purple")))

detach(skincancer)

Hospital infection risk (revisited)

  • Load the infectionrisk data.
  • Select only hospitals in regions 1 or 2.
  • Display a scatterplot of Stay versus InfctRsk.
  • Select only hospitals with Stay < 16 (i.e., remove the two hospitals with extreme values of Stay).
  • Fit a simple linear regression model with y = InfctRsk and x = Stay.
  • Use the model to calculate 95% confidence intervals for E(InfctRsk) at Stay = 10.
  • Use the model to calculate 95% prediction intervals for InfctRsk at Stay = 10.
  • Display a scatterplot of the data with the simple linear regression line, confidence interval bounds, and prediction interval bounds.
infectionrisk <- read.table("~/path-to-folder/infectionrisk.txt", header=T)
infectionrisk <- infectionrisk[infectionrisk$Region==1 | infectionrisk$Region==2, ]
attach(infectionrisk)
plot(x=Stay, y=InfctRsk)
detach(infectionrisk)
infectionrisk <- infectionrisk[infectionrisk$Stay<16, ]
attach(infectionrisk)
plot(x=Stay, y=InfctRsk)

model <- lm(InfctRsk ~ Stay)

predict(model, interval="confidence",
newdata=data.frame(Stay=10))
# fit lwr upr
# 1 4.528846 4.259205 4.798486

predict(model, interval="prediction",
newdata=data.frame(Stay=10))
# fit lwr upr
# 1 4.528846 2.45891 6.598781

plot(x=Stay, y=InfctRsk,
ylim=c(0, 9),
panel.last = c(lines(sort(Stay), fitted(model)[order(Stay)]),
lines(sort(Stay),
predict(model,
interval="confidence")[order(Stay), 2], col="green"),
lines(sort(Stay),
predict(model,
interval="confidence")[order(Stay), 3], col="green"),
lines(sort(Stay),
predict(model,
interval="prediction")[order(Stay), 2], col="purple"),
lines(sort(Stay),
predict(model,
interval="prediction")[order(Stay), 3], col="purple")))

detach(infectionrisk)