Software Help 9
Software Help 9
The next two pages cover the Minitab and R commands for the procedures in this lesson.
Below is a zip file that contains all the data sets used in this lesson:
- allswallows.txt
- bluegills.txt
- hospital.txt
- mammgest.txt
- odor.txt
- shortleaf.txt
- wordrecall.txt
- yield.txt
Minitab Help 9: Data Transformations
Minitab Help 9: Data TransformationsMinitab®
Word recall (log-transforming a predictor)
- Perform a linear regression analysis of prop on time.
- Create a fitted line plot.
- Create residual plots and select "Residuals versus fits" (with regular residuals).
- Conduct regression error normality tests and select Anderson-Darling.
- To create a log(time) variable, select Calc > Calculator, specify the name of the new variable (lntime, for example) in the box labeled "Store result in variable," and type "log(time)" in the box labeled "Expression." Select OK and the new variable should appear in your worksheet.
- Perform a linear regression analysis of prop on log(time).
- Repeat diagnostic plots and normality tests.
- Use Calc > Calculator to create a prop^-1.25 variable and Perform a linear regression analysis of prop^-1.25 on time.
- Repeat diagnostic plots and normality tests.
- Use prop on log(time) model to find:
- 95% prediction interval for a prop at time 1000. Use Find a confidence interval and a prediction interval for the response and specify 6.91 for the individual value of log(time).
- 95% confidence interval for the expected change in prop for a 10-fold increase in time. To display confidence intervals for the model parameters (regression coefficients) click "Results" in the Regression Dialog and select "Expanded tables" for "Display of results."
Mammal gestation (log-transforming the response)
- Perform a linear regression analysis of Gestation on Birthwgt.
- Create a fitted line plot.
- Create residual plots and select "Residuals versus fits" (with regular residuals).
- Conduct regression error normality tests and select Anderson-Darling.
- Use Calc > Calculator to create a log(Gestation) variable and Perform a linear regression analysis of log(Gestation) on Birthwgt.
- Repeat diagnostic plots and normality tests.
- Use log(Gestation) on Birthwgt model to find:
- 95% prediction interval for Gestation for a Birthwgt of 50. Use Find a confidence interval and a prediction interval for the response.
- 95% confidence interval for proportional change in median Gestation for a 1-pound increase in Birthwgt. To display confidence intervals for the model parameters (regression coefficients) click "Results" in the Regression Dialog and select "Expanded tables" for "Display of results."
- 95% confidence interval for proportional change in median Gestation for a 10-pound increase in Birthwgt.
Shortleaf pine trees (log-transforming both response and predictor)
- Perform a linear regression analysis of Vol on Diam.
- Create a fitted line plot.
- Create residual plots and select "Residuals versus fits" (with regular residuals).
- Conduct regression error normality tests and select Anderson-Darling.
- Use Calc > Calculator to create a log(Diam) variable and Perform a linear regression analysis of Vol on log(Diam).
- Repeat diagnostic plots and normality tests.
- Use Calc > Calculator to create a log(Vol) variable and Perform a linear regression analysis of log(Vol) on log(Diam).
- Repeat diagnostic plots and normality tests.
- Use log(Vol) on log(Diam) model to find:
- 95% confidence interval for median Vol for a Diam of 10. Use Find a confidence interval and a prediction interval for the response and specify 2.303 for the individual value of log(Diam).
- 95% confidence interval for proportional change in median Vol for a 2-fold increase in Diam. To display confidence intervals for the model parameters (regression coefficients) click "Results" in the Regression Dialog and select "Expanded tables" for "Display of results."
Underground air quality (interactions)
- Select Graph > 3D Scatterplot (Simple) to create a 3D scatterplot of the data.
- Create interaction variables and Perform a linear regression analysis of Vent on O2 + CO2 + Type + TypeO2 + TypeCO2 + CO2O2.
- Alternatively, Perform a linear regression analysis of Vent on O2 + CO2 + Type, but before clicking "OK," click "Model," select O2, CO2, and Type in the "Predictors" box, change "Interactions through order" to "2" and click "Add." You should see "Type*O2," "Type*CO2," and "CO2*O2" appear in the box labeled "Terms in the model."
- Click "Options" in the regression dialog to select Sequential (Type I) sums of squares for the Anova table.
- Calculate partial F-statistic by hand and Find an F-based P-value.
- Create residual plots and select "Residuals versus fits" (with regular residuals).
- Perform a linear regression analysis of Vent on O2 + CO2 + Type.
- Create residual plots and select "Residuals versus fits" (with regular residuals).
- Conduct regression error normality tests and select Anderson-Darling.
Bluegill fish (polynomial regression)
- Create a basic scatterplot.
- Use Calc > Calculator to create an age-squared variable and Perform a linear regression analysis of length on age + agesq.
- Alternatively, Perform a linear regression analysis of length on age, but before clicking "OK," click "Model," change "Terms through order" to "2" and click "Add." You should see "age*age" appear in the box labeled "Terms in the model."
- Create a fitted line plot and select "Quadratic" for the "Type of regression model."
- Find a confidence interval and a prediction interval for the response.
Experiment yield (polynomial regression)
- Perform a linear regression analysis of Yield on Temp.
- Create a fitted line plot.
- Use Calc > Calculator to create a Temp-squared variable and Perform a linear regression analysis of Yield on Temp + Tempsq.
- Alternatively, Perform a linear regression analysis of Yield on Temp, but before clicking "OK," click "Model," change "Terms through order" to "2" and click "Add." You should see "Temp* Temp" appear in the box labeled "Terms in the model."
- Create a fitted line plot and select "Quadratic" for the "Type of regression model."
Chemical odor (polynomial regression)
- Use Calc > Calculator to create squared variables and Perform a linear regression analysis of Odor on Temp + Ratio + Height + Tempsq + Ratiosq + Heightsq.
- Alternatively, Perform a linear regression analysis of Odor on Temp + Ratio + Height, but before clicking "OK," click "Model," select Temp, Ratio, and Height in the "Predictors" box, change "Terms through order" to "2" and click "Add." You should see "Temp* Temp," "Ratio*Ratio," and "Height*Height" appear in the box labeled "Terms in the model."
- Perform a linear regression analysis of Odor on Temp + Ratio + Height + Tempsq + Ratiosq (remove the "Height*Height" term by clicking "Model," selecting "Height*Height" in the box labeled "Terms in the model," and clicking "X").
R Help 9: Data Transformations
R Help 9: Data TransformationsR Help
Word recall (log-transforming a predictor)
- Load the wordrecall data.
- Fit a simple linear regression model of prop on time.
- Display a scatterplot of the data and add the regression line.
- Display a residual plot with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot. The argument "datax" determines which way round to plot the axes (false by default, which plots the data on the vertical axis, or true, which plots the data on the horizontal axis).
- Load the
nortest
package to access the Anderson-Darling normality test. - Create a log(time) variable and fit a simple linear regression model of prop on log(time).
- Repeat diagnostic plots and normality tests.
- Create a prop^-1.25 variable and fit a simple linear regression model of prop^-1.25 on time.
- Repeat diagnostic plots and normality tests.
- Use prop on log(time) model to find:
- 95% prediction interval for a prop at time 1000.
- 95% confidence interval for the expected change in prop for a 10-fold increase in time.
wordrecall <- read.table("~/path-to-folder/wordrecall.txt", header=T)
attach(wordrecall)
model.1 <- lm(prop ~ time)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 5.259e-01 4.881e-02 10.774 3.49e-07 ***
# time -5.571e-05 1.457e-05 -3.825 0.00282 **
# Multiple R-squared: 0.5709, Adjusted R-squared: 0.5318
plot(x=time, y=prop, ylim=c(-0.1, 0.9),
panel.last = lines(sort(time), fitted(model.1)[order(time)]))
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.1), main="", datax=TRUE)
qqline(residuals(model.1), datax=TRUE)
library(nortest)
ad.test(residuals(model.1)) # A = 0.262, p-value = 0.6426
lntime <- log(time)
model.2 <- lm(prop ~ lntime)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.846415 0.014195 59.63 3.65e-15 ***
# lntime -0.079227 0.002416 -32.80 2.53e-12 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.02339 on 11 degrees of freedom
# Multiple R-squared: 0.9899, Adjusted R-squared: 0.989
# F-statistic: 1076 on 1 and 11 DF, p-value: 2.525e-12
plot(x=lntime, y=prop,
panel.last = lines(sort(lntime), fitted(model.2)[order(lntime)]))
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 0.3216, p-value = 0.4869
prop1.25 <- prop^-1.25
model.3 <- lm(prop1.25 ~ time)
summary(model.3)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.8693698 0.3869678 4.831 0.000527 ***
# time 0.0019708 0.0001155 17.067 2.91e-09 ***
# Multiple R-squared: 0.9636, Adjusted R-squared: 0.9603
plot(x=time, y=prop1.25,
panel.last = lines(sort(time), fitted(model.3)[order(time)]))
plot(x=fitted(model.3), y=residuals(model.3),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.3), main="", datax=TRUE)
qqline(residuals(model.3), datax=TRUE)
ad.test(residuals(model.3)) # A = 1.191, p-value = 0.002584
predict(model.2, interval="prediction",
newdata=data.frame(lntime=log(1000)))
# fit lwr upr
# 1 0.2991353 0.2449729 0.3532978
confint(model.2)[2,]*log(10) # 95% CI for 10-fold increase in time
# 2.5 % 97.5 %
# -0.1946689 -0.1701845
detach(wordrecall)
Mammal gestation (log-transforming the response)
- Load the mammgest data.
- Fit a simple linear regression model of Gestation on Birthwgt.
- Display a scatterplot of the data and add the regression line.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot.
- Apply the Anderson-Darling normality test using the
nortest
package. - Create a log(Gestation) variable and fit a simple linear regression model of log(Gestation) on Birthwgt.
- Repeat diagnostic plots and normality tests.
- Use log(Gestation) on Birthwgt model to find:
- 95% prediction interval for Gestation for a Birthwgt of 50.
- 95% confidence interval for proportional change in median Gestation for a 1-pound increase in Birthwgt.
- 95% confidence interval for proportional change in median Gestation for a 10-pound increase in Birthwgt.
mammgest <- read.table("~/path-to-folder/mammgest.txt", header=T)
attach(mammgest)
model.1 <- lm(Gestation ~ Birthwgt)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 187.0837 26.9426 6.944 6.73e-05 ***
# Birthwgt 3.5914 0.5247 6.844 7.52e-05 ***
# Multiple R-squared: 0.8388, Adjusted R-squared: 0.8209
plot(x=Birthwgt, y=Gestation,
panel.last = lines(sort(Birthwgt), fitted(model.1)[order(Birthwgt)]))
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.1), main="", datax=TRUE)
qqline(residuals(model.1), datax=TRUE)
ad.test(residuals(model.1)) # A = 0.3116, p-value = 0.503
lnGest <- log(Gestation)
model.2 <- lm(lnGest ~ Birthwgt)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 5.278817 0.088177 59.866 5.1e-13 ***
# Birthwgt 0.010410 0.001717 6.062 0.000188 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2163 on 9 degrees of freedom
# Multiple R-squared: 0.8033, Adjusted R-squared: 0.7814
# F-statistic: 36.75 on 1 and 9 DF, p-value: 0.0001878
plot(x=Birthwgt, y=lnGest,
panel.last = lines(sort(Birthwgt), fitted(model.2)[order(Birthwgt)]))
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 0.3135, p-value = 0.4963
exp(predict(model.2, interval="prediction",
newdata=data.frame(Birthwgt=50)))
# fit lwr upr
# 1 330.0781 197.3013 552.2092
# proportional change in median gestation for 1-unit increase in birthwgt
exp(coefficients(model.2)[2]) # 1.010465
exp(confint(model.2)[2,]) # 95% CI
# 2.5 % 97.5 %
# 1.006547 1.014398
# proportional change in median gestation for 10-unit increase in birthwgt
exp(10*coefficients(model.2)[2]) # 1.109714
exp(10*confint(model.2)[2,]) # 95% CI
# 2.5 % 97.5 %
# 1.067429 1.153674
detach(mammgest)
Shortleaf pine trees (log-transforming both response and predictor)
- Load the shortleaf data.
- Fit a simple linear regression model of Vol on Diam.
- Display a scatterplot of the data and add the regression line.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot.
- Apply the Anderson-Darling normality test using the
nortest
package. - Create a log(Diam) variable and fit a simple linear regression model of Vol on log(Diam).
- Repeat diagnostic plots and normality tests.
- Create a log(Vol) variable and fit a simple linear regression model of log(Vol) on log(Diam).
- Repeat diagnostic plots and normality tests.
- Use log(Vol) on log(Diam) model to find:
- 95% confidence interval for median Vol for a Diam of 10.
- 95% confidence interval for proportional change in median Vol for a 2-fold increase in Diam.
shortleaf <- read.table("~/path-to-folder/shortleaf.txt", header=T)
attach(shortleaf)
model.1 <- lm(Vol ~ Diam)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -41.5681 3.4269 -12.13 <2e-16 ***
# Diam 6.8367 0.2877 23.77 <2e-16 ***
# ---
# Multiple R-squared: 0.8926, Adjusted R-squared: 0.891
plot(x=Diam, y=Vol,
panel.last = lines(sort(Diam), fitted(model.1)[order(Diam)]))
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.1), main="", datax=TRUE)
qqline(residuals(model.1), datax=TRUE)
ad.test(residuals(model.1)) # A = 0.9913, p-value = 0.01215
lnDiam <- log(Diam)
model.2 <- lm(Vol ~ lnDiam)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -116.162 10.830 -10.73 2.88e-16 ***
# lnDiam 64.536 4.562 14.15 < 2e-16 ***
# Multiple R-squared: 0.7464, Adjusted R-squared: 0.7427
plot(x=lnDiam, y=Vol,
panel.last = lines(sort(lnDiam), fitted(model.2)[order(lnDiam)]))
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 2.3845, p-value = 4.273e-06
lnVol <- log(Vol)
model.3 <- lm(lnVol ~ lnDiam)
summary(model.3)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -2.8718 0.1216 -23.63 <2e-16 ***
# lnDiam 2.5644 0.0512 50.09 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1703 on 68 degrees of freedom
# Multiple R-squared: 0.9736, Adjusted R-squared: 0.9732
# F-statistic: 2509 on 1 and 68 DF, p-value: < 2.2e-16
plot(x=lnDiam, y=lnVol,
panel.last = lines(sort(lnDiam), fitted(model.3)[order(lnDiam)]))
plot(x=fitted(model.3), y=residuals(model.3),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.3), main="", datax=TRUE)
qqline(residuals(model.3), datax=TRUE)
ad.test(residuals(model.3)) # A = 0.5309, p-value = 0.1692
exp(predict(model.3, interval="confidence",
newdata=data.frame(lnDiam=log(10))))
# fit lwr upr
# 1 20.75934 19.92952 21.62372
# proportional change in median Vol for 2-fold increase in Diam
2^(coefficients(model.3)[2]) # 5.915155
2^(confint(model.3)[2,]) # 95% CI
# 2.5 % 97.5 %
# 5.510776 6.349207
detach(shortleaf)
Underground air quality (interactions)
- Load the swallows data.
- Load the
car
package to access 3D scatterplots. - Create interaction variables and fit a multiple linear regression model of Vent on O2 + CO2 + Type + TypeO2 + TypeCO2 + CO2O2.
- Use
anova
function to display the ANOVA table with sequential (type I) sums of squares. - Calculate partial F-statistic and p-value.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Fit a multiple linear regression model of Vent on O2 + CO2 + Type.
- Display residual plots with fitted (predicted) values on the horizontal axis.
- Display a normal probability plot of the residuals and add a diagonal line to the plot.
- Apply the Anderson-Darling normality test using the
nortest
package.
swallows <- read.table("~/path-to-folder/allswallows.txt", header=T)
attach(swallows)
library(car)
scatter3d(Vent ~ O2 + CO2, subset=Type==1) # adult
scatter3d(Vent ~ O2 + CO2, subset=Type==0) # nestling
scatter3d(Vent ~ O2 + CO2, subset=Type==0, revolutions=3, speed=0.5, grid=F)
TypeO2 <- Type*O2
TypeCO2 <- Type*CO2
CO2O2 <- CO2*O2
model.1 <- lm(Vent ~ O2 + CO2 + Type + TypeO2 + TypeCO2 + CO2O2)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -18.399 160.007 -0.115 0.9086
# O2 1.189 9.854 0.121 0.9041
# CO2 54.281 25.987 2.089 0.0378 *
# Type 111.658 157.742 0.708 0.4797
# TypeO2 -7.008 9.560 -0.733 0.4642
# TypeCO2 2.311 7.126 0.324 0.7460
# CO2O2 -1.449 1.593 -0.909 0.3642
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 165.6 on 233 degrees of freedom
# Multiple R-squared: 0.272, Adjusted R-squared: 0.2533
# F-statistic: 14.51 on 6 and 233 DF, p-value: 4.642e-14
anova(model.1) # Sequential (type I) SS
# Df Sum Sq Mean Sq F value Pr(>F)
# O2 1 93651 93651 3.4156 0.06585 .
# CO2 1 2247696 2247696 81.9762 < 2e-16 ***
# Type 1 5910 5910 0.2156 0.64288
# TypeO2 1 14735 14735 0.5374 0.46425
# TypeCO2 1 2884 2884 0.1052 0.74598
# CO2O2 1 22664 22664 0.8266 0.36421
# Residuals 233 6388603 27419
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
((14735+2884+22664)/3)/27419 # F-stat = 0.4897212
pf(0.49, 3, 233, lower.tail=F) # p-value = 0.6895548
plot(x=fitted(model.1), y=residuals(model.1),
panel.last = abline(h=0, lty=2))
model.2 <- lm(Vent ~ O2 + CO2 + Type)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 136.767 79.334 1.724 0.086 .
# O2 -8.834 4.765 -1.854 0.065 .
# CO2 32.258 3.551 9.084 <2e-16 ***
# Type 9.925 21.308 0.466 0.642
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 165 on 236 degrees of freedom
# Multiple R-squared: 0.2675, Adjusted R-squared: 0.2581
# F-statistic: 28.72 on 3 and 236 DF, p-value: 7.219e-16
plot(x=fitted(model.2), y=residuals(model.2),
panel.last = abline(h=0, lty=2))
qqnorm(residuals(model.2), main="", datax=TRUE)
qqline(residuals(model.2), datax=TRUE)
ad.test(residuals(model.2)) # A = 0.3175, p-value = 0.5358
detach(swallows)
Bluegill fish (polynomial regression)
- Load the bluegills data.
- Display a scatterplot of the data.
- Create an age-squared variable and fit a multiple linear regression model of length on age + agesq.
- Add a quadratic regression line to the scatterplot.
- Find a 95% prediction interval for length for an age of 5.
bluegills <- read.table("~/path-to-folder/bluegills.txt", header=T)
attach(bluegills)
plot(x=age, y=length)
agesq <- age^2
model <- lm(length ~ age + agesq)
summary(model)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 13.622 11.016 1.237 0.22
# age 54.049 6.489 8.330 2.81e-12 ***
# agesq -4.719 0.944 -4.999 3.67e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 10.91 on 75 degrees of freedom
# Multiple R-squared: 0.8011, Adjusted R-squared: 0.7958
# F-statistic: 151.1 on 2 and 75 DF, p-value: < 2.2e-16
newX <- seq(min(age), max(age), length=100)
newXsq <- newX**2
plot(x=age, y=length,
panel.last = lines(newX,
predict(model,
newdata=data.frame(age=newX, agesq=newXsq))))
predict(model, interval="prediction",
newdata=data.frame(age=5, agesq=25))
# fit lwr upr
# 1 165.9023 143.487 188.3177
detach(bluegills)
Experiment yield (polynomial regression)
- Load the yield data.
- Fit a simple linear regression model of Yield on Temp.
- Display a scatterplot of the data and add the simple linear regression line.
- Create a Temp-squared variable and fit a multiple linear regression model of Yield on Temp + Tempsq.
- Display a scatterplot of the data and add a quadratic regression line to the scatterplot.
yield <- read.table("~/path-to-folder/yield.txt", header=T)
attach(yield)
model.1 <- lm(Yield ~ Temp)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 2.306306 0.469075 4.917 0.000282 ***
# Temp 0.006757 0.005873 1.151 0.270641
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.3913 on 13 degrees of freedom
# Multiple R-squared: 0.09242, Adjusted R-squared: 0.0226
# F-statistic: 1.324 on 1 and 13 DF, p-value: 0.2706
plot(x=Temp, y=Yield,
panel.last = lines(sort(Temp), fitted(model.1)[order(Temp)]))
Tempsq <- Temp^2
model.2 <- lm(Yield ~ Temp + Tempsq)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 7.9604811 1.2589183 6.323 3.81e-05 ***
# Temp -0.1537113 0.0349408 -4.399 0.000867 ***
# Tempsq 0.0010756 0.0002329 4.618 0.000592 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2444 on 12 degrees of freedom
# Multiple R-squared: 0.6732, Adjusted R-squared: 0.6187
# F-statistic: 12.36 on 2 and 12 DF, p-value: 0.001218
newX <- seq(min(Temp), max(Temp), length=100)
newXsq <- newX**2
plot(x=Temp, y=Yield,
panel.last = lines(newX,
predict(model.2,
newdata=data.frame(Temp=newX, Tempsq=newXsq))))
detach(yield)
Chemical odor (polynomial regression)
- Load the odor data.
- Create squared variables and fit a multiple linear regression model of Odor on Temp + Ratio + Height + Tempsq + Ratiosq + Heightsq.
- Fit a multiple linear regression model of Odor on Temp + Ratio + Height + Tempsq + Ratiosq.
odor <- read.table("~/path-to-folder/odor.txt", header=T)
attach(odor)
Tempsq <- Temp^2
Ratiosq <- Ratio^2
Heightsq <- Height^2
model.1 <- lm(Odor ~ Temp + Ratio + Height + Tempsq + Ratiosq + Heightsq)
summary(model.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -30.667 10.840 -2.829 0.0222 *
# Temp -12.125 6.638 -1.827 0.1052
# Ratio -17.000 6.638 -2.561 0.0336 *
# Height -21.375 6.638 -3.220 0.0122 *
# Tempsq 32.083 9.771 3.284 0.0111 *
# Ratiosq 47.833 9.771 4.896 0.0012 **
# Heightsq 6.083 9.771 0.623 0.5509
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 18.77 on 8 degrees of freedom
# Multiple R-squared: 0.8683, Adjusted R-squared: 0.7695
# F-statistic: 8.789 on 6 and 8 DF, p-value: 0.003616
model.2 <- lm(Odor ~ Temp + Ratio + Height + Tempsq + Ratiosq)
summary(model.2)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -26.923 8.707 -3.092 0.012884 *
# Temp -12.125 6.408 -1.892 0.091024 .
# Ratio -17.000 6.408 -2.653 0.026350 *
# Height -21.375 6.408 -3.336 0.008720 **
# Tempsq 31.615 9.404 3.362 0.008366 **
# Ratiosq 47.365 9.404 5.036 0.000703 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 18.12 on 9 degrees of freedom
# Multiple R-squared: 0.8619, Adjusted R-squared: 0.7852
# F-statistic: 11.23 on 5 and 9 DF, p-value: 0.001169
detach(odor)