# Software Help 7

Software Help 7

The next two pages cover the Minitab and R commands for the procedures in this lesson.

Below is a text file that contains the data used in this lesson:

# Minitab Help 7: MLR Estimation, Prediction & Model Assumptions

Minitab Help 7: MLR Estimation, Prediction & Model Assumptions##
Minitab^{®}

### IQ and physical characteristics (confidence and prediction intervals)

- Perform a linear regression analysis of PIQ on Brain and Height.
- Find a confidence interval and a prediction interval for the response.

### IQ and physical characteristics (residual plots and normality tests)

- Perform a linear regression analysis of PIQ on Brain and Height.
- Create residual plots and select "Residuals versus fits" (with regular residuals).
- Create residual plots and specify Brain, Height, and Weight in the "Residuals versus the variables" box (with regular residuals).
- Create residual plots and select "Histogram of residuals" (with regular residuals).
- Create residual plots and select "Normal probability plot of residuals" (with regular residuals).
- Conduct regression error normality tests and select:
- Anderson-Darling
- Ryan-Joiner (similar to Shapiro-Wilk)
- Kolmogorov-Smirnov (Lilliefors)

### Toluca refrigerator parts (tests for constant error variance)

- Perform a linear regression analysis of
*WorkHours*on*LotSize*and store the residuals,*RESI*. - Modified Levene (Brown-Forsythe):
- Select Data > Code > To Text, select
*LotSize*for "Code values in the following columns," select "Code ranges of values" for "Method," type "20" for the first lower endpoint, type "70" for the first upper endpoint, type "1" for the first coded value, type "80" for the second lower endpoint, type "120" for the second upper endpoint, type "2" for the second coded value, select "Both endpoints" for "Endpoints to include," and click OK. - Select Stat > Basic Statistics > Store Descriptive Statistics, select
*RESI*for "Variables," select 'Coded LotSize' for "By variables," click "Statistics," select "Median," and click OK. This calculates \(\tilde{e}_1=-19.876\) and \(\tilde{e}_2=-2.684\). - Select Calc >Calculator, type "d" for "Store result in variable," type "abs('RESI'-IF('Coded LotSize'="1",-19.876,-2.684))" for "Expression," and click OK. This calculates
*d*and_{1}*d*._{2} - Select Stat > Basic Statistics > Store Descriptive Statistics, select
*d*for "Variables," select 'Coded LotSize' for "By variables," click "Statistics," select "Mean," and click OK. This calculates \(\bar{d}_1=44.8151\) and \(\bar{d}_2=28.4503\). - Select Calc >Calculator, type "devsq" for "Store result in variable," type "(d-IF('Coded LotSize'="1",44.8151,28.4503))^2" for "Expression," and click OK. This calculates \((d_1-\bar{d}_1)^2\) and \((d_2-\bar{d}_2)^2\).
- Select Stat > Basic Statistics > Store Descriptive Statistics, select
*devsq*for "Variables," select*'*Coded LotSize' for "By variables," click "Statistics," select "Sum," and click OK. This calculates \(\sum{(d_1-\bar{d}_1)^2}=12566.6\) and \(\sum{(d_2-\bar{d}_2)^2}=9610.3\). - We can then calculate \(s_L = \sqrt{(12566.6+9610.3)/23} = 31.05\) and \(L = (44.8151-28.4053)/(31.05\sqrt{(1/13+1/12)}) = 1.316\).
- Select Calc > Probability Distributions > t, type "23" for "Degrees of freedom," select "Input constant," type "1.316," and click OK. This calculates the probability area to the left of 1.316 as 0.89943, which means the p-value for the test is \(2(1-0.89943) = 0.20\), i.e., there is no evidence the errors have nonconstant variance.

- Select Data > Code > To Text, select
- Breusch-Pagan (Cook-Weisberg score):
- Select Calc >Calculator, type "esq" for "Store result in variable," type "'RESI'^2" for "Expression," and click OK. This calculates the squared residuals.
- Fit the model with response
*esq*and predictor*LotSize*. Observe SSR^{*}= 7896142. - We can then calculate \(X^{2*} = (7896142/2) / (54825/25)^2 = 0.821\).
- Select Calc > Probability Distributions > Chi-Square, type "1" for "Degrees of freedom," select "Input constant," type "0.821," and click OK. This calculates the probability area to the left of 0.821 as 0.635112, which means the p-value for the test is \(1-0.635112 = 0.36\), i.e., there is no evidence the errors have nonconstant variance.

# R Help 7: MLR Estimation, Prediction & Model Assumptions

R Help 7: MLR Estimation, Prediction & Model Assumptions## R

### IQ and physical characteristics (confidence and prediction intervals)

- Load the iqsize data.
- Fit a multiple linear regression model of PIQ on Brain and Height.
- Calculate a 95% confidence interval for mean PIQ at Brain=90, Height=70.
- Calculate a 95% confidence interval for mean PIQ at Brain=79, Height=62.
- Calculate a 95% prediction interval for individual PIQ at Brain=90, Height=70.

```
iqsize <- read.table("~/path-to-folder/iqsize.txt", header=T)
attach(iqsize)
predict(model, interval="confidence", se.fit=T,
newdata=data.frame(Brain=90, Height=70))
# $fit
# fit lwr upr
# 1 105.6391 98.23722 113.041
#
# $se.fit
# [1] 3.646064
predict(model, interval="confidence", se.fit=T,
newdata=data.frame(Brain=79, Height=62))
# $fit
# fit lwr upr
# 1 104.8114 91.41796 118.2049
#
# $se.fit
# [1] 6.597407
predict(model, interval="prediction",
newdata=data.frame(Brain=90, Height=70))
# $fit
# fit lwr upr
# 1 105.6391 65.34688 145.9314
detach(iqsize)
```

### IQ and physical characteristics (residual plots and normality tests)

- Load the iqsize data.
- Fit a multiple linear regression model of PIQ on Brain and Height.
- Display residual plot with fitted (predicted) values on the horizontal axis.
- Display residual plot with Brain on the horizontal axis.
- Display residual plot with Height on the horizontal axis.
- Display histogram of the residuals.
- Display normal probability plot of the residuals and add a diagonal line to the plot. The argument "datax" determines which way round to plot the axes (false by default, which plots the data on the vertical axis, or true, which plots the data on the horizontal axis).
- Display residual plot with Weight on the horizontal axis.
- Load the
`nortest`

package to access normality tests:- Anderson-Darling
- Shapiro-Wilk (Ryan-Joiner unavailable in R)
- Lilliefors (Kolmogorov-Smirnov)

```
iqsize <- read.table("~/path-to-folder/iqsize.txt", header=T)
attach(iqsize)
model <- lm(PIQ ~ Brain + Height)
plot(x=fitted(model), y=residuals(model),
xlab="Fitted values", ylab="Residuals",
panel.last = abline(h=0, lty=2))
plot(x=Brain, y=residuals(model),
ylab="Residuals",
panel.last = abline(h=0, lty=2))
plot(x=Height, y=residuals(model),
ylab="Residuals",
panel.last = abline(h=0, lty=2))
hist(residuals(model), main="")
qqnorm(residuals(model), main="", datax=TRUE)
qqline(residuals(model), datax=TRUE)
plot(x=Weight, y=residuals(model),
ylab="Residuals",
panel.last = abline(h=0, lty=2))
library(nortest)
ad.test(residuals(model)) # A = 0.2621, p-value = 0.6857
shapiro.test(residuals(model)) # W = 0.976, p-value = 0.5764
lillie.test(residuals(model)) # D = 0.097, p-value = 0.4897
detach(iqsize)
```

### Toluca refrigerator parts (tests for constant error variance)

- Load the Toluca data.
- Fit a simple linear regression model of WorkHours on LotSize.
- Create lotgroup factor variable splitting the sample into two groups.
- Load the
`car`

package to access tests for constant error variance:- Modified Levene (Brown-Forsythe)
- Cook-Weisberg score (Breusch-Pagan)

```
toluca <- read.table("~/path-to-folder/toluca.txt", header=T)
attach(toluca)
model <- lm(WorkHours ~ LotSize)
lotgroup <- factor(LotSize<=70)
library(car)
leveneTest(residuals(model), group=lotgroup)
# Levene's Test for Homogeneity of Variance (center = median)
# Df F value Pr(>F)
# group 1 1.7331 0.201
# 23
ncvTest(model)
# Non-constant Variance Score Test
# Variance formula: ~ fitted.values
# Chisquare = 0.8209192 Df = 1 p = 0.3649116
detach(toluca)
```