R Help 1: Simple Linear Regression

Instructions

Temperature

  • Create the temperature data and produce a scatterplot with points and lines:
C <- seq(0, 50, by=5)
F <- (9/5)*C+32
plot(C, F, type="b", xlab="Celsius", ylab="Fahrenheit", ylim=c(30,130))

Skin cancer

  • Load the skin cancer data and produce a scatterplot with a simple linear regression line:
skincancer <- read.table("~/path-to-folder/skincancer.txt", header=T)
attach(skincancer)
model <- lm(Mort ~ Lat)
plot(x=Lat, y=Mort,
     xlab="Latitude (at center of state)", ylab="Mortality (deaths per 10 million)",
     main="Skin Cancer Mortality versus State Latitude",
     panel.last = lines(sort(Lat), fitted(model)[order(Lat)]))
detach(skincancer)

Student height and weight

  • Load the student height and weight data.
  • Fit a simple linear regression model.
  • Produce a scatterplot with a simple linear regression line and another line with specified intercept and slope.
  • Calculate sum of squared errors (SSE).
  • Predict weight for height=66 and height=67.
heightweight <- read.table("~/path-to-folder/student_height_weight.txt", header=T)
attach(heightweight)

model <- lm(wt ~ ht)
summary(model)
# Hashtag denotes comments
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept) -266.5344    51.0320  -5.223    8e-04 ***
# ht             6.1376     0.7353   8.347 3.21e-05 ***

plot(x=ht, y=wt, ylim=c(110,210), xlab="height", ylab="weight",
     panel.last = c(lines(sort(ht), fitted(model[order(ht)]),
                    lines(ht, -331.2+7.1*ht, lty=2)))
sum(residuals(model)^2) # SSE = 597.386
predict(model, newdata=data.frame(ht=c(66, 67))) # 138.5460 144.6836
detach(heightweight)

High school GPA and college test scores

  • Generate the high school GPA and college test score (population) data.
  • Produce a scatterplot of the population data with the population regression line.
  • Sample the data (your results will differ since we're randomly sampling here).
  • Produce a scatterplot of the sample data with a simple linear regression line and the the population regression line.
  • Calculate sum of squared errors (SSE), mean square error (MSE), and regression (or residual) standard error (S).
X <- c(rep(1, 100), rep(2, 100), rep(3, 100), rep(4, 100))
Y <- 2 + 4*X + rnorm(400, 0, 1)
plot(X, Y, xlab="High school gpa", ylab="College entrance test score",
     panel.last = lines(X, 2+4*X))
Xs <- c(rep(1, 3), rep(2, 3), rep(3, 3), rep(4, 3))
Ys <- Y[c(rep(0, 3), rep(100, 3), rep(200, 3), rep(300, 3)) + sample.int(100, 12)]
model <- lm(Ys ~ Xs)
plot(Xs, Ys, xlab="High school gpa", ylab="College entrance test score",
     panel.last = c(lines(Xs, 2+4*Xs),
                    lines(sort(Xs), fitted(model[order(Xs)]),lty=2))
sum(residuals(model)^2) # SSE = 8.677833
sum(residuals(model)^2)/10 # MSE = 0.8677833
sqrt(sum(residuals(model)^2)/10) # S = 0.9315489
summary(model) # Residual standard error: 0.9315 on 10 degrees of freedom

Skin cancer

  • Load the skin cancer data.
  • Fit a simple linear regression model with y = Mort and x = Lat and display the coefficient of determination, \(R^2\).
  • Calculate the correlation between Mort and Lat.
skincancer <- read.table("~/path-to-folder/skincancer.txt", header=T)
attach(skincancer)
model <- lm(Mort ~ Lat)
summary(model) # Multiple R-squared:  0.6798
cor(Mort, Lat) # correlation = -0.8245178
detach(skincancer)

Temperature

  • Create the temperature data.
  • Fit a simple linear regression model with y = F and x = C and display the coefficient of determination, \(R^2\).
  • Calculate the correlation between F and C.
C <- seq(0, 50, by=5)
F <- (9/5)*C+32
model <- lm(F ~ C)
summary(model) # Multiple R-squared:      1
cor(F, C) # correlation = 1

Building stories

  • Load the building stories data.
  • Fit a simple linear regression model with y = Height and x = Stories and display the coefficient of determination, \(R^2\).
  • Calculate the correlation between Height and Stories.
bldgstories <- read.table("~/path-to-folder/bldgstories.txt", header=T)
attach(bldgstories)
model <- lm(HGHT ~ STORIES)
summary(model) # Multiple R-squared:  0.9036
cor(HGHT, STORIES) # correlation = 0.9505549
detach(bldgstories)

Driver's age and distance

  • Load the driver's age and distance data.
  • Fit a simple linear regression model with y = Distance and x = Age and display the coefficient of determination, \(R^2\).
  • Calculate the correlation between Distance and Age.
signdist <- read.table("~/path-to-folder/signdist.txt", header=T)
attach(signdist)
model <- lm(Distance ~ Age)
summary(model) # Multiple R-squared:  0.642
cor(Distance, Age) # correlation = -0.8012447
detach(signdist)

Student's height and GPA

  • Load the student's height and GPA data.
  • Fit a simple linear regression model with y = GPA and x = Height and display the coefficient of determination, \(R^2\).
  • Calculate the correlation between GPA and Height.
heightgpa <- read.table("~/path-to-folder/heightgpa.txt", header=T)
attach(heightgpa)
model <- lm(gpa ~ height)
summary(model) # Multiple R-squared:  0.002835
cor(gpa, height) # correlation = -0.05324126
detach(heightgpa)

Teen birth rate and poverty

  • Load the teen birth rate and poverty data.
  • Fit a simple linear regression model with y = Brth15to17 and x = PovPct and display the model results.
  • Produce a scatterplot with a simple linear regression line.
poverty <- read.table("~/path-to-folder/poverty.txt", header=T)
attach(poverty)

model <- lm(Brth15to17 ~ PovPct)
summary(model)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   4.2673     2.5297   1.687    0.098 .  
# PovPct        1.3733     0.1835   7.483 1.19e-09 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 5.551 on 49 degrees of freedom
# Multiple R-squared:  0.5333,  Adjusted R-squared:  0.5238 

plot(PovPct, Brth15to17, xlab="Poverty Rate", ylab="15 to 17 Year Old Birth Rate",
     panel.last = lines(sort(PovPct), fitted(model)[order(PovPct)]))
detach(poverty)

Lung function

  • Load the lung function data.
  • Fit a simple linear regression model with y = FEV and x = age for ages 6-10 only and display the model results.
  • Produce a scatterplot for ages 6-10 only with a simple linear regression line.
  • Fit a simple linear regression model with y = FEV and x = age for the full dataset and display the model results.
  • Produce a scatterplot for the full dataset with a simple linear regression line.
lungfunction <- read.table("~/path-to-folder/fev_dat.txt", header=T)
attach(lungfunction)

model.1 <- lm(FEV ~ age, subset = age>=6 & age<=10)
summary(model.1)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.01165    0.15237   0.076    0.939    
# age          0.26721    0.01801  14.839   <2e-16 ***

plot(age[age>=6 & age<=10], FEV[age>=6 & age<=10], 
     xlab="Age", ylab="Forced Exhalation Volume (FEV)",
     panel.last = lines(sort(age[age>=6 & age<=10]), 
                        fitted(model.1)[order(age[age>=6 & age<=10])]))

model.2 <- lm(FEV ~ age)
summary(model.2)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 0.431648   0.077895   5.541 4.36e-08 ***
# age         0.222041   0.007518  29.533  < 2e-16 ***

plot(age, FEV, xlab="Age", ylab="Forced Exhalation Volume (FEV)",
     panel.last = lines(sort(age), fitted(model.2)[order(age)]))
detach(lungfunction)