Software Help 1
Software Help 1
The next two pages cover the Minitab and R commands for the procedures in this lesson.
Below is a zip file that contains all the data sets used in this lesson:
- bldgstories.txt
- carstopping.txt
- drugdea.txt
- fev_dat.txt
- heightgpa.txt
- husbandwife.txt
- infant.txt
- mccoo.txt
- oldfaithful.txt
- poverty.txt
- practical.txt
- signdist.txt
- skincancer.txt
- student_height_weight.txt
Minitab Help 1: Simple Linear Regression
Minitab Help 1: Simple Linear RegressionMinitab®
Skin cancer mortality
Student height and weight
- Perform a basic regression analysis.
- Create a fitted line plot.
- Find a confidence interval and a prediction interval for the response to predict weight for height=66 and height=67.
Skin cancer mortality (revisited)
Building stories
Driver's age and distance
Student's height and GPA
Teen birth rate and poverty
Lung function
- Use Data > Subset Worksheet to select only observations with age between 6 and 10.
- Perform a basic regression analysis.
- Create a fitted line plot.
R Help 1: Simple Linear Regression
R Help 1: Simple Linear RegressionR Instructions
Temperature
- Create the temperature data and produce a scatterplot with points and lines:
C <- seq(0, 50, by=5)
F <- (9/5)*C+32
plot(C, F, type="b", xlab="Celsius", ylab="Fahrenheit", ylim=c(30,130))
Skin cancer
- Load the skin cancer data and produce a scatterplot with a simple linear regression line:
skincancer <- read.table("~/path-to-folder/skincancer.txt", header=T)
attach(skincancer)
model <- lm(Mort ~ Lat)
plot(x=Lat, y=Mort,
xlab="Latitude (at center of state)", ylab="Mortality (deaths per 10 million)",
main="Skin Cancer Mortality versus State Latitude",
panel.last = lines(sort(Lat), fitted(model)[order(Lat)]))
detach(skincancer)
Student height and weight
- Load the student height and weight data.
- Fit a simple linear regression model.
- Produce a scatterplot with a simple linear regression line and another line with specified intercept and slope.
- Calculate sum of squared errors (SSE).
- Predict weight for height=66 and height=67.
heightweight <- read.table("~/path-to-folder/student_height_weight.txt", header=T)
attach(heightweight)
model <- lm(wt ~ ht)
summary(model)
# Hashtag denotes comments
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -266.5344 51.0320 -5.223 8e-04 ***
# ht 6.1376 0.7353 8.347 3.21e-05 ***
plot(x=ht, y=wt, ylim=c(110,210), xlab="height", ylab="weight",
panel.last = c(lines(sort(ht), fitted(model[order(ht)]),
lines(ht, -331.2+7.1*ht, lty=2)))
sum(residuals(model)^2) # SSE = 597.386
predict(model, newdata=data.frame(ht=c(66, 67))) # 138.5460 144.6836
detach(heightweight)
High school GPA and college test scores
- Generate the high school GPA and college test score (population) data.
- Produce a scatterplot of the population data with the population regression line.
- Sample the data (your results will differ since we're randomly sampling here).
- Produce a scatterplot of the sample data with a simple linear regression line and the the population regression line.
- Calculate sum of squared errors (SSE), mean square error (MSE), and regression (or residual) standard error (S).
X <- c(rep(1, 100), rep(2, 100), rep(3, 100), rep(4, 100))
Y <- 2 + 4*X + rnorm(400, 0, 1)
plot(X, Y, xlab="High school gpa", ylab="College entrance test score",
panel.last = lines(X, 2+4*X))
Xs <- c(rep(1, 3), rep(2, 3), rep(3, 3), rep(4, 3))
Ys <- Y[c(rep(0, 3), rep(100, 3), rep(200, 3), rep(300, 3)) + sample.int(100, 12)]
model <- lm(Ys ~ Xs)
plot(Xs, Ys, xlab="High school gpa", ylab="College entrance test score",
panel.last = c(lines(Xs, 2+4*Xs),
lines(sort(Xs), fitted(model[order(Xs)]),lty=2))
sum(residuals(model)^2) # SSE = 8.677833
sum(residuals(model)^2)/10 # MSE = 0.8677833
sqrt(sum(residuals(model)^2)/10) # S = 0.9315489
summary(model) # Residual standard error: 0.9315 on 10 degrees of freedom
Skin cancer
- Load the skin cancer data.
- Fit a simple linear regression model with y = Mort and x = Lat and display the coefficient of determination, \(R^2\).
- Calculate the correlation between Mort and Lat.
skincancer <- read.table("~/path-to-folder/skincancer.txt", header=T)
attach(skincancer)
model <- lm(Mort ~ Lat)
summary(model) # Multiple R-squared: 0.6798
cor(Mort, Lat) # correlation = -0.8245178
detach(skincancer)
Temperature
- Create the temperature data.
- Fit a simple linear regression model with y = F and x = C and display the coefficient of determination, \(R^2\).
- Calculate the correlation between F and C.
C <- seq(0, 50, by=5)
F <- (9/5)*C+32
model <- lm(F ~ C)
summary(model) # Multiple R-squared: 1
cor(F, C) # correlation = 1
Building stories
- Load the building stories data.
- Fit a simple linear regression model with y = Height and x = Stories and display the coefficient of determination, \(R^2\).
- Calculate the correlation between Height and Stories.
bldgstories <- read.table("~/path-to-folder/bldgstories.txt", header=T)
attach(bldgstories)
model <- lm(HGHT ~ STORIES)
summary(model) # Multiple R-squared: 0.9036
cor(HGHT, STORIES) # correlation = 0.9505549
detach(bldgstories)
Driver's age and distance
- Load the driver's age and distance data.
- Fit a simple linear regression model with y = Distance and x = Age and display the coefficient of determination, \(R^2\).
- Calculate the correlation between Distance and Age.
signdist <- read.table("~/path-to-folder/signdist.txt", header=T)
attach(signdist)
model <- lm(Distance ~ Age)
summary(model) # Multiple R-squared: 0.642
cor(Distance, Age) # correlation = -0.8012447
detach(signdist)
Student's height and GPA
- Load the student's height and GPA data.
- Fit a simple linear regression model with y = GPA and x = Height and display the coefficient of determination, \(R^2\).
- Calculate the correlation between GPA and Height.
heightgpa <- read.table("~/path-to-folder/heightgpa.txt", header=T)
attach(heightgpa)
model <- lm(gpa ~ height)
summary(model) # Multiple R-squared: 0.002835
cor(gpa, height) # correlation = -0.05324126
detach(heightgpa)
Teen birth rate and poverty
- Load the teen birth rate and poverty data.
- Fit a simple linear regression model with y = Brth15to17 and x = PovPct and display the model results.
- Produce a scatterplot with a simple linear regression line.
poverty <- read.table("~/path-to-folder/poverty.txt", header=T)
attach(poverty)
model <- lm(Brth15to17 ~ PovPct)
summary(model)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 4.2673 2.5297 1.687 0.098 .
# PovPct 1.3733 0.1835 7.483 1.19e-09 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 5.551 on 49 degrees of freedom
# Multiple R-squared: 0.5333, Adjusted R-squared: 0.5238
plot(PovPct, Brth15to17, xlab="Poverty Rate", ylab="15 to 17 Year Old Birth Rate",
panel.last = lines(sort(PovPct), fitted(model)[order(PovPct)]))
detach(poverty)
Lung function
- Load the lung function data.
- Fit a simple linear regression model with y = FEV and x = age for ages 6-10 only and display the model results.
- Produce a scatterplot for ages 6-10 only with a simple linear regression line.
- Fit a simple linear regression model with y = FEV and x = age for the full dataset and display the model results.
- Produce a scatterplot for the full dataset with a simple linear regression line.
lungfunction <- read.table("~/path-to-folder/fev_dat.txt", header=T)
attach(lungfunction)
model.1 <- lm(FEV ~ age, subset = age>=6 & age<=10)
summary(model.1)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.01165 0.15237 0.076 0.939
# age 0.26721 0.01801 14.839 <2e-16 ***
plot(age[age>=6 & age<=10], FEV[age>=6 & age<=10],
xlab="Age", ylab="Forced Exhalation Volume (FEV)",
panel.last = lines(sort(age[age>=6 & age<=10]),
fitted(model.1)[order(age[age>=6 & age<=10])]))
model.2 <- lm(FEV ~ age)
summary(model.2)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.431648 0.077895 5.541 4.36e-08 ***
# age 0.222041 0.007518 29.533 < 2e-16 ***
plot(age, FEV, xlab="Age", ylab="Forced Exhalation Volume (FEV)",
panel.last = lines(sort(age), fitted(model.2)[order(age)]))
detach(lungfunction)