# Software Help 5

Software Help 5

## Help

The next two pages cover the Minitab and R commands for the procedures in this lesson.

Below is a zip file that contains all the data sets used in this lesson:

STAT501_Lesson05.zip

• babybirds.txt
• bodyfat.txt
• hospital_infct.txt
• fev_dat.txt
• iqsize.txt
• pastry.txt
• soapsuds.txt
• stat_females.txt

# Minitab Help 5: Multiple Linear Regression

Minitab Help 5: Multiple Linear Regression

## Soapsuds example (using matrices)

• Perform a linear regression analysis of suds on soap.
• Click "Storage" in the regression dialog and check "Design matrix" to store the design matrix, X.
• To create $$X^T$$: Select Calc > Matrices > Transpose, select "XMAT" to go into the "Transpose from" box, and type "M2" in the "Store result in" box.
• To calculate $$X^{T} X$$: Select Calc > Matrices > Arithmetic, click "Multiply," select "M2" to go in the left-hand box, select "XMAT" to go in the right-hand box, and type "M3" in the "Store result in" box. Display the result by selecting Data > Display Data.
• To calculate $$X^{T} Y$$: Select Calc > Matrices > Arithmetic, click "Multiply," select "M2" to go in the left-hand box, select "suds" to go in the right-hand box, and type "M4" in the "Store result in" box. Display the result by selecting Data > Display Data.
• To calculate $$\left(X^{T}X\right)^{-1} \colon$$ Select Calc > Matrices > Invert, select "M3" to go in the "Invert from" box, and type "M5" in the "Store result in" box. Display the result by selecting Data > Display Data.
• To calculate b = $$\left(X^{T}X\right)^{-1} X^{T} Y \colon$$ Select Calc > Matrices > Arithmetic, click "Multiply," select "M5" to go in the left-hand box, select "M4" to go in the right-hand box, and type "M6" in the "Store result in" box. Display the result by selecting Data > Display Data.

## Pastry sweetness

• Obtain a sample correlation
• Create a basic scatterplot.
• Perform a linear regression analysis of Rating on Moisture and Sweetness.
• Click "Storage" in the regression dialog and check "Fits" to store the fitted (predicted) values.
• To create a scatterplot of the data with points marked by Sweetness and two lines representing the fitted regression equation for each group:
• Select Calc > Calculator, type "FITS_2" in the "Store result in variable" box, and type "IF('Sweetness'=2,'FITS')" in the "Expression" box. Repeat for FITS_4 (Sweetness=4).
• Create a basic scatter plot but select "With Groups" instead of "Simple." Select Rating as the "Y variable," Moisture as the "X variable," and Sweetness as the "Categorical variable for grouping."
• Select Editor > Add > Calculated Line and select "FITS_2" to go into the "Y column" and "Moisture" to go into the "X column." Repeat for FITS_4 (Sweetness=4).
• Perform a linear regression analysis of Rating on Moisture.
• Perform a linear regression analysis of Rating on Sweetness.

## Physiological measurements (using matrices)

• Perform a linear regression analysis of BodyFat on Triceps, Thigh, and Midarm.
• Click "Storage" in the regression dialog and check "Design matrix" to store the design matrix, X.
• To create $$X^T$$: Select Calc > Matrices > Transpose, select "XMAT" to go into the "Transpose from" box, and type "M2" in the "Store result in" box.
• To calculate $$X^{T} X$$: Select Calc > Matrices > Arithmetic, click "Multiply," select "M2" to go in the left-hand box, select "XMAT" to go in the right-hand box, and type "M3" in the "Store result in" box. Display the result by selecting Data > Display Data.
• To calculate $$\left(X^{T}X\right)^{-1} \colon$$ Select Calc > Matrices > Invert, select "M3" to go in the "Invert from" box, and type "M4" in the "Store result in" box. Display the result by selecting Data > Display Data.

# R Help 5: Multiple Linear Regression

R Help 5: Multiple Linear Regression

## IQ and physical characteristics

• Display a scatterplot matrix of the data.
• Fit a multiple linear regression model of PIQ on Brain, Height, and Weight.
• Display model results.
• Use the anova function to display the ANOVA table with sequential (type I) sums of squares.
• Use the Anova function from the car package to display the ANOVA table with adjusted (type III) sums of squares.
iqsize <- read.table("~/path-to-folder/iqsize.txt", header=T)
attach(iqsize)

pairs(cbind(PIQ, Brain, Height, Weight))

model <- lm(PIQ ~ Brain + Height + Weight)
summary(model)
# Coefficients:
#               Estimate Std. Error t value Pr(>|t|)
# (Intercept)  1.114e+02  6.297e+01   1.768 0.085979 .
# Brain        2.060e+00  5.634e-01   3.657 0.000856 ***
# Height      -2.732e+00  1.229e+00  -2.222 0.033034 *
# Weight       5.599e-04  1.971e-01   0.003 0.997750
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 19.79 on 34 degrees of freedom
# Multiple R-squared:  0.2949,  Adjusted R-squared:  0.2327
# F-statistic: 4.741 on 3 and 34 DF,  p-value: 0.007215

anova(model) # Sequential (type I) SS
# Analysis of Variance Table
# Response: PIQ
#           Df  Sum Sq Mean Sq F value  Pr(>F)
# Brain      1  2697.1 2697.09  6.8835 0.01293 *
# Height     1  2875.6 2875.65  7.3392 0.01049 *
# Weight     1     0.0    0.00  0.0000 0.99775
# Residuals 34 13321.8  391.82
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

library(car)
Anova(model, type="III") # Adjusted (type III) SS
# Anova Table (Type III tests)
# Response: PIQ
#              Sum Sq Df F value    Pr(>F)
# (Intercept)  1225.2  1  3.1270 0.0859785 .
# Brain        5239.2  1 13.3716 0.0008556 ***
# Height       1934.7  1  4.9378 0.0330338 *
# Weight          0.0  1  0.0000 0.9977495
# Residuals   13321.8 34
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

detach(iqsize)

## Underground air quality

• Display a scatterplot matrix of the data.
• Use the scatter3d function from the car package to create a 3D scatterplot of the data.
• Fit a multiple linear regression model of Vent on O2 and CO2.
• Display model results.
• Use the Anova function from the car package to display the ANOVA table with adjusted (type III) sums of squares.
babybirds <- read.table("~/path-to-folder/babybirds.txt", header=T)
attach(babybirds)

pairs(cbind(Vent, O2, CO2))

library(car)
scatter3d(Vent ~ O2 + CO2)
scatter3d(Vent ~ O2 + CO2, revolutions=3, speed=0.5, grid=F)

model <- lm(Vent ~ O2 + CO2)
summary(model)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   85.901    106.006   0.810    0.419
# O2            -5.330      6.425  -0.830    0.408
# CO2           31.103      4.789   6.495  2.1e-09 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 157.4 on 117 degrees of freedom
# Multiple R-squared:  0.2682,  Adjusted R-squared:  0.2557
# F-statistic: 21.44 on 2 and 117 DF,  p-value: 1.169e-08

Anova(model, type="III") # Adjusted (type III) SS
# Anova Table (Type III tests)
# Response: Vent
#              Sum Sq  Df F value    Pr(>F)
# (Intercept)   16262   1  0.6566    0.4194
# O2            17045   1  0.6883    0.4084
# CO2         1044773   1 42.1866 2.104e-09 ***
# Residuals   2897566 117
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

detach(babybirds)

## Soapsuds example (using matrices)

• Fit a simple linear regression model of suds on soap and store the model matrix, X.
• Display model results.
• Calculate $$X^{T}X , X^{T}Y , (X^{T} X)^{-1}$$ , and $$b = (X^{T}X)^{-1} X^{T}Y$$ .
• Fit a multiple linear regression model with linearly dependent predictors.
soapsuds <- read.table("~/path-to-folder/soapsuds.txt", header=T)
attach(soapsuds)

model <- lm(suds ~ soap, x=T)
summary(model)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  -2.6786     4.2220  -0.634    0.554
# soap          9.5000     0.7553  12.579 5.64e-05 ***

X <- model$x t(X) %*% X # (Intercept) soap # (Intercept) 7.0 38.50 # soap 38.5 218.75 t(X) %*% suds # [,1] # (Intercept) 347 # soap 1975 solve(t(X) %*% X) # (Intercept) soap # (Intercept) 4.4642857 -0.7857143 # soap -0.7857143 0.1428571 solve(t(X) %*% X) %*% (t(X) %*% suds) # [,1] # (Intercept) -2.678571 # soap 9.500000 soap2 <- 2*soap model <- lm(suds ~ soap + soap2) summary(model) # Coefficients: (1 not defined because of singularities) # Estimate Std. Error t value Pr(>|t|) # (Intercept) -2.6786 4.2220 -0.634 0.554 # soap 9.5000 0.7553 12.579 5.64e-05 *** # soap2 NA NA NA NA detach(soapsuds) ## Pastry sweetness • Load the pastry data. • Calculate the correlation between the predictors and create a scatterplot. • Fit a multiple linear regression model of Rating on Moisture and Sweetness and display the model results. • Create a scatterplot of the data with points marked by Sweetness and two lines representing the fitted regression equation for each sweetness level. • Fit a simple linear regression model of Rating on Moisture and display the model results. • Fit a simple linear regression model of Rating on Sweetness and display the model results. pastry <- read.table("~/path-to-folder/pastry.txt", header=T) attach(pastry) cor(Sweetness, Moisture) # 0 plot(Sweetness, Moisture) model.12 <- lm(Rating ~ Moisture + Sweetness) summary(model.12) # Estimate Std. Error t value Pr(>|t|) # (Intercept) 37.6500 2.9961 12.566 1.20e-08 *** # Moisture 4.4250 0.3011 14.695 1.78e-09 *** # Sweetness 4.3750 0.6733 6.498 2.01e-05 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 2.693 on 13 degrees of freedom # Multiple R-squared: 0.9521, Adjusted R-squared: 0.9447 # F-statistic: 129.1 on 2 and 13 DF, p-value: 2.658e-09 plot(Moisture, Rating, type="n") for (i in 1:16) points(Moisture[i], Rating[i], pch=Sweetness[i], col=Sweetness[i]) for (i in c(2,4)) lines(Moisture[Sweetness==i], fitted(model.12)[Sweetness==i], lty=i, col=i) legend("topleft", legend=c("Sweetness = 4", "Sweetness = 2"), col=c(4,2), pch=c(4,2), inset=0.02) model.1 <- lm(Rating ~ Moisture) summary(model.1) # Estimate Std. Error t value Pr(>|t|) # (Intercept) 50.775 4.395 11.554 1.52e-08 *** # Moisture 4.425 0.598 7.399 3.36e-06 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 5.349 on 14 degrees of freedom # Multiple R-squared: 0.7964, Adjusted R-squared: 0.7818 # F-statistic: 54.75 on 1 and 14 DF, p-value: 3.356e-06 model.2 <- lm(Rating ~ Sweetness) summary(model.2) # Estimate Std. Error t value Pr(>|t|) # (Intercept) 68.625 8.610 7.970 1.43e-06 *** # Sweetness 4.375 2.723 1.607 0.13 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 10.89 on 14 degrees of freedom # Multiple R-squared: 0.1557, Adjusted R-squared: 0.09539 # F-statistic: 2.582 on 1 and 14 DF, p-value: 0.1304 detach(pastry)  ## Female stat students • Load the statfemales data. • Display a scatterplot matrix of the data. • Fit a multiple linear regression model of Height on momheight and dadheight and display the model results. • Create a residual plot. statfemales <- read.table("~/path-to-folder/stat_females.txt", header=T) attach(statfemales) pairs(cbind(Height, momheight, dadheight)) model <- lm(Height ~ momheight + dadheight) summary(model) # Estimate Std. Error t value Pr(>|t|) # (Intercept) 18.54725 3.69278 5.023 1.08e-06 *** # momheight 0.30351 0.05446 5.573 7.61e-08 *** # dadheight 0.38786 0.04721 8.216 2.10e-14 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 2.031 on 211 degrees of freedom # Multiple R-squared: 0.4335, Adjusted R-squared: 0.4281 # F-statistic: 80.73 on 2 and 211 DF, p-value: < 2.2e-16 plot(fitted(model), residuals(model), panel.last = abline(h=0, lty=2)) detach(statfemales)  ## Hospital infections • Load the infectionrisk data. • Fit a multiple linear regression model of InfctRsk on Stay, Age, and Xray and display the model results. infectionrisk <- read.table("~/path-to-folder/infectionrisk.txt", header=T) attach(infectionrisk) model <- lm(InfctRsk ~ Stay + Age + Xray) summary(model) # Estimate Std. Error t value Pr(>|t|) # (Intercept) 1.001162 1.314724 0.761 0.448003 # Stay 0.308181 0.059396 5.189 9.88e-07 *** # Age -0.023005 0.023516 -0.978 0.330098 # Xray 0.019661 0.005759 3.414 0.000899 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 1.085 on 109 degrees of freedom # Multiple R-squared: 0.363, Adjusted R-squared: 0.3455 # F-statistic: 20.7 on 3 and 109 DF, p-value: 1.087e-10 detach(infectionrisk)  ## Physiological measurements (using matrices) • Load the bodyfat data. • Fit a multiple linear regression model of BodyFat on Triceps, Thigh, and Midarm and store the model matrix, X. • Display model results. • Calculate MSE and $$(X^{T} X)^{-1}$$ and multiply them to find the variance-covariance matrix of the regression parameters. • Use the variance-covariance matrix of the regression parameters to derive: • the regression parameter standard errors. • covariances and correlations between regression parameter estimates. bodyfat <- read.table("~/path-to-folder/bodyfat.txt", header=T) attach(bodyfat) model <- lm(Bodyfat ~ Triceps + Thigh + Midarm, x=T) summary(model) # Estimate Std. Error t value Pr(>|t|) # (Intercept) 117.085 99.782 1.173 0.258 # Triceps 4.334 3.016 1.437 0.170 # Thigh -2.857 2.582 -1.106 0.285 # Midarm -2.186 1.595 -1.370 0.190 # # Residual standard error: 2.48 on 16 degrees of freedom # Multiple R-squared: 0.8014, Adjusted R-squared: 0.7641 # F-statistic: 21.52 on 3 and 16 DF, p-value: 7.343e-06 anova(model) # Df Sum Sq Mean Sq F value Pr(>F) # Triceps 1 352.27 352.27 57.2768 1.131e-06 *** # Thigh 1 33.17 33.17 5.3931 0.03373 * # Midarm 1 11.55 11.55 1.8773 0.18956 # Residuals 16 98.40 6.15 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 MSE <- sum(residuals(model)^2)/model$df.residual # 6.150306
X <- model\$x
XTXinv <- solve(t(X) %*% X)
#             (Intercept)    Triceps       Thigh      Midarm
# (Intercept)  1618.86721 48.8102522 -41.8487041 -25.7987855
# Triceps        48.81025  1.4785133  -1.2648388  -0.7785022
# Thigh         -41.84870 -1.2648388   1.0839791   0.6657581
# Midarm        -25.79879 -0.7785022   0.6657581   0.4139009

sqrt(MSE*diag(XTXinv)) # standard errors of the regression parameters
# (Intercept)     Triceps       Thigh      Midarm
#   99.782403    3.015511    2.582015    1.595499

MSE*XTXinv[2,3] # cov(b1, b2) = -7.779145
XTXinv[2,3]/sqrt(XTXinv[2,2]*XTXinv[3,3]) # cor(b1, b2) = -0.9991072

detach(bodyfat)


## Peruvian blood pressure

• Calculate FracLife variable.
• Fit full multiple linear regression model of Systol on nine predictors.
• Fit reduced multiple linear regression model of Systol on four predictors.
• Calculate SSE for the full and reduced models.
• Calculate the general linear F statistic by hand and find the p-value.
• Use the anova function with full and reduced models to display F-statistic and p-value directly.
peru <- read.table("~/path-to-folder/peru.txt", header=T)
attach(peru)

FracLife <- Years/Age

model.1 <- lm(Systol ~ Age + Years + FracLife + Weight + Height + Chin +
Forearm + Calf + Pulse)
summary(model.1)
#               Estimate Std. Error t value Pr(>|t|)
# (Intercept)  146.81907   48.97096   2.998 0.005526 **
# Age           -1.12144    0.32741  -3.425 0.001855 **
# Years          2.45538    0.81458   3.014 0.005306 **
# FracLife    -115.29395   30.16900  -3.822 0.000648 ***
# Weight         1.41393    0.43097   3.281 0.002697 **
# Height        -0.03464    0.03686  -0.940 0.355194
# Chin          -0.94369    0.74097  -1.274 0.212923
# Forearm       -1.17085    1.19329  -0.981 0.334612
# Calf          -0.15867    0.53716  -0.295 0.769810
# Pulse          0.11455    0.17043   0.672 0.506818

anova(model.1)
#           Df  Sum Sq Mean Sq F value    Pr(>F)
# Age        1    0.22    0.22  0.0030  0.956852
# Years      1   82.55   82.55  1.1019  0.302514
# FracLife   1 3112.41 3112.41 41.5449 4.728e-07 ***
# Weight     1  706.54  706.54  9.4311  0.004603 **
# Height     1    1.68    1.68  0.0224  0.882117
# Chin       1  297.68  297.68  3.9735  0.055704 .
# Forearm    1  113.91  113.91  1.5205  0.227440
# Calf       1   10.01   10.01  0.1336  0.717420
# Pulse      1   33.84   33.84  0.4518  0.506818
# Residuals 29 2172.58   74.92

model.2 <- lm(Systol ~ Age + Years + FracLife + Weight)
summary(model.2)
#              Estimate Std. Error t value Pr(>|t|)
# (Intercept)  116.8354    21.9797   5.316 6.69e-06 ***
# Age           -0.9507     0.3164  -3.004 0.004971 **
# Years          2.3393     0.7714   3.032 0.004621 **
# FracLife    -108.0728    28.3302  -3.815 0.000549 ***
# Weight         0.8324     0.2754   3.022 0.004742 **

anova(model.2)
#           Df  Sum Sq Mean Sq F value    Pr(>F)
# Age        1    0.22    0.22  0.0029  0.957480
# Years      1   82.55   82.55  1.0673  0.308840
# FracLife   1 3112.41 3112.41 40.2409 3.094e-07 ***
# Weight     1  706.54  706.54  9.1350  0.004742 **
# Residuals 34 2629.71   77.34

(2629.71-2172.58)/(34-29) / (2172.58/29) # F = 1.220371
pf(1.220371, 5, 29, lower.tail=F) # p-value = 0.3247213

anova(model.2, model.1)
#   Res.Df    RSS Df Sum of Sq      F Pr(>F)
# 1     34 2629.7
# 2     29 2172.6  5    457.12 1.2204 0.3247

detach(peru)


## Measurements of college students

• Fit full multiple linear regression model of Height on LeftArm, LeftFoot, HeadCirc, and nose.
• Create a residual plot.
• Fit reduced multiple linear regression model of Height on LeftArm and LeftFoot.
• Calculate SSE for the full and reduced models.
• Calculate the general linear F statistic by hand and find the p-value.
• Use the anova function with full and reduced models to display F-statistic and p-value directly.
• Calculate partial R-squared for (LeftArm | LeftFoot).
physical <- read.table("~/path-to-folder/Physical.txt", header=T)
attach(physical)

model.1 <- lm(Height ~ LeftArm + LeftFoot + HeadCirc + nose)
summary(model.1)
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 18.50265    7.83031   2.363   0.0221 *
# LeftArm      0.80205    0.17074   4.697 2.09e-05 ***
# LeftFoot     0.99730    0.16230   6.145 1.30e-07 ***
# HeadCirc     0.08052    0.14952   0.539   0.5926
# nose        -0.14740    0.49233  -0.299   0.7659

plot(fitted(model.1), residuals(model.1),
panel.last = abline(h=0, lty=2))

anova(model.1)
#           Df Sum Sq Mean Sq  F value    Pr(>F)
# LeftArm    1 590.21  590.21 123.8106 3.917e-15 ***
# LeftFoot   1 224.35  224.35  47.0621 9.931e-09 ***
# HeadCirc   1   1.40    1.40   0.2940    0.5901
# nose       1   0.43    0.43   0.0896    0.7659
# Residuals 50 238.35    4.77

model.2 <- lm(Height ~ LeftArm + LeftFoot)
summary(model.2)
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  21.8572     3.5840   6.098 1.35e-07 ***
# LeftArm       0.7958     0.1652   4.816 1.31e-05 ***
# LeftFoot      1.0229     0.1468   6.969 5.54e-09 ***

anova(model.2)
#           Df Sum Sq Mean Sq F value    Pr(>F)
# LeftArm    1 590.21  590.21 127.782 1.275e-15 ***
# LeftFoot   1 224.35  224.35  48.572 5.538e-09 ***
# Residuals 52 240.18    4.62

(240.18-238.35)/(52-50) / (238.35/50) # F = 0.1919446
pf(0.1919446, 2, 50, lower.tail=F) # p-value = 0.8259579

anova(model.2, model.1)
#   Res.Df    RSS Df Sum of Sq      F Pr(>F)
# 1     52 240.18
# 2     50 238.35  2    1.8289 0.1918 0.8261

model.3 <- lm(Height ~ LeftFoot)
anova(model.3)
#           Df Sum Sq Mean Sq F value    Pr(>F)
# LeftFoot   1 707.42  707.42  107.95 2.172e-14 ***
# Residuals 53 347.33    6.55

(347.33-240.18) / 347.33 # Partial R-squared (LeftArm | LeftFoot) = 0.3084962

detach(physical)


 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility