Heart attacks in rabbits
- Load the coolhearts data.
- Fit a multiple linear regression model of Infarc on Area, X2 (early cooling), and X3 (late cooling).
- Display model results.
- Create a scatterplot of the data with points marked by group and three lines representing the fitted regression equation for each group.
coolhearts <- read.table("~/path-to-folder/coolhearts.txt", header=T)
attach(coolhearts)
model.1 <- lm(Infarc ~ Area + X2 + X3)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.13454 0.10402 -1.293 0.206459
# Area 0.61265 0.10705 5.723 3.87e-06 ***
# X2 -0.24348 0.06229 -3.909 0.000536 ***
# X3 -0.06566 0.06507 -1.009 0.321602
plot(Area, Infarc, type="n", ylim=c(-0.2, 1),
xlab="Size of area at risk (grams)",
ylab="Size of infarcted area (grams)")
for (i in 1:32) points(Area[i], Infarc[i], pch=Group[i], col=Group[i])
for (i in 1:3) lines(Area[Group==i], fitted(model.1)[Group==i], lty=i, col=i)
legend("topleft", legend=c("no cooling",
"late cooling",
"early cooling"),
col=3:1, pch=3:1, inset=0.02)
Student heights and GPAs
- Load the heightgpa data.
- Fit a simple linear regression model of GPA on height.
- Create a scatterplot of the data with a fitted simple linear regression line and a horizontal line at the mean of GPA.
- Calculate SSE for the full and reduced models.
- Calculate the overall F statistic by hand.
- Find the p-value for the overall F statistic.
- Display the overall F statistic and p-value in the model results.
heightgpa <- read.table("~/path-to-folder/heightgpa.txt", header=T)
attach(heightgpa)
model <- lm(gpa ~ height)
plot(height,gpa,xlab="Height (inches)",ylab="Grade Point Average",
panel.last = c(lines(sort(height), fitted(model)[order(height)]),
abline(h=mean(gpa), lty=2)))
sum(residuals(model)^2) # SSE_F = 9.705507
mean(gpa) # 2.971714
sum((gpa-mean(gpa))^2) # SSE_R = 9.733097
((9.733097-9.705507)/1) / (9.705507/33) # overall F-statistic = 0.09380963
pf(0.09380963, 1, 33, lower.tail=F) # p-value = 0.7613129
summary(model) # F-statistic: 0.09381 on 1 and 33 DF, p-value: 0.7613
detach(heightgpa)
Skin cancer mortality
- Load the skincancer data.
- Fit a simple linear regression model of Mort on Lat.
- Create a scatterplot of the data with a fitted simple linear regression line and a horizontal line at the mean of Mort.
- Calculate SSE for the full and reduced models.
- Calculate the overall F statistic by hand.
- Find the p-value for the overall F statistic.
- Display the overall F statistic and p-value in the model results.
skincancer <- read.table("~/path-to-folder/skincancer.txt", header=T)
attach(skincancer)
model <- lm(Mort ~ Lat)
plot(Lat,Mort,xlab="Latitude (at center of state)",
ylab="Mortality (deaths per 10 million)",
panel.last = c(lines(sort(Lat), fitted(model)[order(Lat)]),
abline(h=mean(Mort), lty=2)))
sum(residuals(model)^2) # SSE_F = 17173.07
mean(Mort) # 152.8776
sum((Mort-mean(Mort))^2) # SSE_R = 53637.27
((53637.27-17173.07)/1) / (17173.07/47) # overall F-statistic = 99.7968
pf(99.7968, 1, 47, lower.tail=F) # p-value = 3.309471e-13
summary(model) # F-statistic: 99.8 on 1 and 47 DF, p-value: 3.309e-13
detach(skincancer)
Alcoholism and muscle strength
- Load the alcoholarm data.
- Fit a simple linear regression model of strength on alcohol.
- Calculate SSE for the full and reduced models.
- Calculate the overall F statistic by hand.
- Display the overall F statistic and p-value in the model results and the ANOVA table.
alcoholarm <- read.table("~/path-to-folder/alcoholarm.txt", header=T)
attach(alcoholarm)
model <- lm(strength ~ alcohol)
sum((strength-mean(strength))^2) # SSE_R = 1224.315
sum(residuals(model)^2) # SSE_F = 720.2749
((1224.315-720.2749)/1) / (720.2749/48) # F = 33.58985
summary(model) # F-statistic: 33.59 on 1 and 48 DF, p-value: 5.136e-07
anova(model)
# Analysis of Variance Table
# Response: strength
# Df Sum Sq Mean Sq F value Pr(>F)
# alcohol 1 504.04 504.04 33.59 5.136e-07 ***
# Residuals 48 720.27 15.01
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
detach(alcoholarm)
Allen cognitive level study
- Load the allentest data.
- Calculate SSTO.
- Fit SLR model of ACL on Vocab and display the ANOVA table (with sequential sums of squares).
- Fit MLR model of ACL on Vocab and SDMT and display ANOVA table (with sequential sums of squares).
- Calculate SSR(Vocab, SDMT) by hand using sequential sums of squares.
- Fit SLR model of ACL on SDMT and display the ANOVA table (with sequential sums of squares).
- Fit MLR model of ACL on SDMT and Vocab and display ANOVA table (with sequential sums of squares).
- Calculate SSR(Vocab, SDMT) by hand using sequential sums of squares.
- Fit MLR model of ACL on SDMT, Vocab, and Abstract and display ANOVA table (with sequential sums of squares).
- Calculate SSR(Vocab, Abstract | SDMT) by hand using sequential sums of squares.
allentest <- read.table("~/path-to-folder/allentest.txt", header=T)
attach(allentest)
sum((ACL-mean(ACL))^2) # SSTO = 43.04957
model.1 <- lm(ACL ~ Vocab)
anova(model.1)
# Analysis of Variance Table
# Response: ACL
# Df Sum Sq Mean Sq F value Pr(>F)
# Vocab 1 2.691 2.69060 4.4667 0.03829 *
# Residuals 67 40.359 0.60237
model.13 <- lm(ACL ~ Vocab + SDMT)
anova(model.13) # Sequential (type I) SS
# Df Sum Sq Mean Sq F value Pr(>F)
# Vocab 1 2.6906 2.6906 5.6786 0.02006 *
# SDMT 1 9.0872 9.0872 19.1789 4.35e-05 ***
# Residuals 66 31.2717 0.4738
# Calculate by hand: SSR(Vocab, SDMT) = 2.6906 + 9.0872 = 11.7778
model.3 <- lm(ACL ~ SDMT)
anova(model.3)
# Df Sum Sq Mean Sq F value Pr(>F)
# SDMT 1 11.68 11.6799 24.946 4.468e-06 ***
# Residuals 67 31.37 0.4682
model.31 <- lm(ACL ~ SDMT + Vocab)
anova(model.31) # Sequential (type I) SS
# Df Sum Sq Mean Sq F value Pr(>F)
# SDMT 1 11.6799 11.6799 24.6508 5.12e-06 ***
# Vocab 1 0.0979 0.0979 0.2067 0.6508
# Residuals 66 31.2717 0.4738
# Calculate by hand: SSR(Vocab, SDMT) = 11.6799 + 0.0979 = 11.7778
model.312 <- lm(ACL ~ SDMT + Vocab + Abstract)
anova(model.312) # Sequential (type I) SS
# Df Sum Sq Mean Sq F value Pr(>F)
# SDMT 1 11.6799 11.6799 24.6902 5.173e-06 ***
# Vocab 1 0.0979 0.0979 0.2070 0.6506
# Abstract 1 0.5230 0.5230 1.1056 0.2969
# Residuals 65 30.7487 0.4731
# Calculate by hand: SSR(Vocab, Abstract | SDMT) = 0.0979 + 0.5230 = 0.6209
detach(allentest)
Heart attacks in rabbits (revisited)
- Load the coolhearts data.
- Fit a multiple linear regression model of Infarc on Area, X2 (early cooling), and X3 (late cooling).
- Test all slope parameters equal to 0.
- Display the overall F statistic and p-value in the model results.
- Test one slope parameter is 0.
- Use the
Anova
function from the car
package to display F-statistic in an ANOVA table using adjusted (type III) sums of squares.
- Or (easier), use a t-test from model results.
- Test a subset of slope parameters is 0.
- Fit the full model (with Area, X2, and X3) and reduced model (Area only), and calculate the general linear F-statistic.
- Or, use the
anova
function with a full model to display the ANOVA table with sequential (type I) sums of squares, and calculate partial F-statistic.
- Or (easier), use the
anova
function with full and reduced models to display F-statistic and p-value directly.
- Calculate partial R-squared for (X2, X3 | Area).
coolhearts <- read.table("~/path-to-folder/coolhearts.txt", header=T)
attach(coolhearts)
model.1 <- lm(Infarc ~ Area + X2 + X3)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.13454 0.10402 -1.293 0.206459
# Area 0.61265 0.10705 5.723 3.87e-06 ***
# X2 -0.24348 0.06229 -3.909 0.000536 ***
# X3 -0.06566 0.06507 -1.009 0.321602
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1395 on 28 degrees of freedom
# Multiple R-squared: 0.6377, Adjusted R-squared: 0.5989
# F-statistic: 16.43 on 3 and 28 DF, p-value: 2.363e-06
anova(model.1) # Sequential (type I) SS
# Analysis of Variance Table
# Response: Infarc
# Df Sum Sq Mean Sq F value Pr(>F)
# Area 1 0.62492 0.62492 32.1115 4.504e-06 ***
# X2 1 0.31453 0.31453 16.1622 0.000398 ***
# X3 1 0.01981 0.01981 1.0181 0.321602
# Residuals 28 0.54491 0.01946
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Anova(model.1, type="III") # Adjusted (type III) SS
# Anova Table (Type III tests)
# Response: Infarc
# Sum Sq Df F value Pr(>F)
# (Intercept) 0.03255 1 1.6728 0.2064588
# Area 0.63742 1 32.7536 3.865e-06 ***
# X2 0.29733 1 15.2781 0.0005365 ***
# X3 0.01981 1 1.0181 0.3216018
# Residuals 0.54491 28
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Hand calculation: F for Area = (0.63742/1) / (0.54491/28) = 32.75359
pf(32.75359, 1, 28, lower.tail=F) # 3.865451e-06
5.723^2 # 32.75273 (t-statistic squared = F-statistic)
model.2 <- lm(Infarc ~ Area)
anova(model.2)
# Df Sum Sq Mean Sq F value Pr(>F)
# Area 1 0.62492 0.62492 21.322 6.844e-05 ***
# Residuals 30 0.87926 0.02931
((0.87926-0.54491)/(30-28)) / (0.54491/28) # General linear F-stat = 8.590226
((0.31453+0.01981)/2) / (0.54491/28) # Partial F-stat = 8.589969
pf(8.59, 2, 28, lower.tail=F) # 0.001233006
anova(model.2, model.1)
# Analysis of Variance Table
# Model 1: Infarc ~ Area
# Model 2: Infarc ~ Area + X2 + X3
# Res.Df RSS Df Sum of Sq F Pr(>F)
# 1 30 0.87926
# 2 28 0.54491 2 0.33435 8.5902 0.001233 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
0.31453+0.01981 # SSR(X2, X3 | Area) = 0.33434
# SSE(Area) = 0.87926
0.33434 / 0.87926 # Partial R-squared (X2, X3 | Area) = 0.3802516
detach(coolhearts)