Software Help 7

Software Help 7

  

The next two pages cover the Minitab and R commands for the procedures in this lesson.

Below is a text file that contains the data used in this lesson:

iqsize.txt


Minitab Help 7: MLR Estimation, Prediction & Model Assumptions

Minitab Help 7: MLR Estimation, Prediction & Model Assumptions

Minitab®

IQ and physical characteristics (confidence and prediction intervals)

IQ and physical characteristics (residual plots and normality tests)

Toluca refrigerator parts (tests for constant error variance)

  • Perform a linear regression analysis of WorkHours on LotSize and store the residuals, RESI.
  • Modified Levene (Brown-Forsythe):
    • Select Data > Code > To Text, select LotSize for "Code values in the following columns," select "Code ranges of values" for "Method," type "20" for the first lower endpoint, type "70" for the first upper endpoint, type "1" for the first coded value, type "80" for the second lower endpoint, type "120" for the second upper endpoint, type "2" for the second coded value, select "Both endpoints" for "Endpoints to include," and click OK.
    • Select Stat > Basic Statistics > Store Descriptive Statistics, select RESI for "Variables," select 'Coded LotSize' for "By variables," click "Statistics," select "Median," and click OK. This calculates \(\tilde{e}_1=-19.876\) and \(\tilde{e}_2=-2.684\).
    • Select Calc >Calculator, type "d" for "Store result in variable," type "abs('RESI'-IF('Coded LotSize'="1",-19.876,-2.684))" for "Expression," and click OK. This calculates d1 and d2.
    • Select Stat > Basic Statistics > Store Descriptive Statistics, select d for "Variables," select 'Coded LotSize' for "By variables," click "Statistics," select "Mean," and click OK. This calculates \(\bar{d}_1=44.8151\) and \(\bar{d}_2=28.4503\).
    • Select Calc >Calculator, type "devsq" for "Store result in variable," type "(d-IF('Coded LotSize'="1",44.8151,28.4503))^2" for "Expression," and click OK.  This calculates \((d_1-\bar{d}_1)^2\) and \((d_2-\bar{d}_2)^2\).
    • Select Stat > Basic Statistics > Store Descriptive Statistics, select devsq for "Variables," select 'Coded LotSize' for "By variables," click "Statistics," select "Sum," and click OK. This calculates \(\sum{(d_1-\bar{d}_1)^2}=12566.6\) and \(\sum{(d_2-\bar{d}_2)^2}=9610.3\).
    • We can then calculate \(s_L = \sqrt{(12566.6+9610.3)/23} = 31.05\) and \(L = (44.8151-28.4053)/(31.05\sqrt{(1/13+1/12)}) = 1.316\).
    • Select Calc > Probability Distributions > t, type "23" for "Degrees of freedom," select "Input constant," type "1.316," and click OK. This calculates the probability area to the left of 1.316 as 0.89943, which means the p-value for the test is \(2(1-0.89943) = 0.20\), i.e., there is no evidence the errors have nonconstant variance.
  • Breusch-Pagan (Cook-Weisberg score):
    • Select Calc >Calculator, type "esq" for "Store result in variable," type "'RESI'^2" for "Expression," and click OK. This calculates the squared residuals.
    • Fit the model with response esq and predictor LotSize. Observe SSR* = 7896142.
    • We can then calculate \(X^{2*} = (7896142/2) / (54825/25)^2 =  0.821\).
    • Select Calc > Probability Distributions > Chi-Square, type "1" for "Degrees of freedom," select "Input constant," type "0.821," and click OK. This calculates the probability area to the left of 0.821 as 0.635112, which means the p-value for the test is \(1-0.635112 = 0.36\), i.e., there is no evidence the errors have nonconstant variance.

R Help 7: MLR Estimation, Prediction & Model Assumptions

R Help 7: MLR Estimation, Prediction & Model Assumptions

R

IQ and physical characteristics (confidence and prediction intervals)

  • Load the iqsize data.
  • Fit a multiple linear regression model of PIQ on Brain and Height.
  • Calculate a 95% confidence interval for mean PIQ at Brain=90, Height=70.
  • Calculate a 95% confidence interval for mean PIQ at Brain=79, Height=62.
  • Calculate a 95% prediction interval for individual PIQ at Brain=90, Height=70.
iqsize <- read.table("~/path-to-folder/iqsize.txt", header=T)
attach(iqsize)

predict(model, interval="confidence", se.fit=T,
        newdata=data.frame(Brain=90, Height=70))
# $fit
#        fit      lwr     upr
# 1 105.6391 98.23722 113.041
# 
# $se.fit
# [1] 3.646064

predict(model, interval="confidence", se.fit=T,
        newdata=data.frame(Brain=79, Height=62))
# $fit
#        fit      lwr     upr
# 1 104.8114 91.41796 118.2049
# 
# $se.fit
# [1] 6.597407

predict(model, interval="prediction",
        newdata=data.frame(Brain=90, Height=70))
# $fit
#        fit      lwr     upr
# 1 105.6391 65.34688 145.9314

detach(iqsize)

IQ and physical characteristics (residual plots and normality tests)

  • Load the iqsize data.
  • Fit a multiple linear regression model of PIQ on Brain and Height.
  • Display residual plot with fitted (predicted) values on the horizontal axis.
  • Display residual plot with Brain on the horizontal axis.
  • Display residual plot with Height on the horizontal axis.
  • Display histogram of the residuals.
  • Display normal probability plot of the residuals and add a diagonal line to the plot. The argument "datax" determines which way round to plot the axes (false by default, which plots the data on the vertical axis, or true, which plots the data on the horizontal axis).
  • Display residual plot with Weight on the horizontal axis.
  • Load the nortest package to access normality tests:
    • Anderson-Darling
    • Shapiro-Wilk (Ryan-Joiner unavailable in R)
    • Lilliefors (Kolmogorov-Smirnov)
iqsize <- read.table("~/path-to-folder/iqsize.txt", header=T)
attach(iqsize)

model <- lm(PIQ ~ Brain + Height)

plot(x=fitted(model), y=residuals(model),
     xlab="Fitted values", ylab="Residuals",
     panel.last = abline(h=0, lty=2))

plot(x=Brain, y=residuals(model),
     ylab="Residuals",
     panel.last = abline(h=0, lty=2))

plot(x=Height, y=residuals(model),
     ylab="Residuals",
     panel.last = abline(h=0, lty=2))

hist(residuals(model), main="")

qqnorm(residuals(model), main="", datax=TRUE)
qqline(residuals(model), datax=TRUE)

plot(x=Weight, y=residuals(model),
     ylab="Residuals",
     panel.last = abline(h=0, lty=2))

library(nortest)
ad.test(residuals(model)) # A = 0.2621, p-value = 0.6857
shapiro.test(residuals(model)) # W = 0.976, p-value = 0.5764
lillie.test(residuals(model)) # D = 0.097, p-value = 0.4897

detach(iqsize)

Toluca refrigerator parts (tests for constant error variance)

  • Load the Toluca data.
  • Fit a simple linear regression model of WorkHours on LotSize.
  • Create lotgroup factor variable splitting the sample into two groups.
  • Load the car package to access tests for constant error variance:
    • Modified Levene (Brown-Forsythe)
    • Cook-Weisberg score (Breusch-Pagan)
toluca <- read.table("~/path-to-folder/toluca.txt", header=T)
attach(toluca)

model <- lm(WorkHours ~ LotSize)
lotgroup <- factor(LotSize<=70)

library(car)
leveneTest(residuals(model), group=lotgroup)
# Levene's Test for Homogeneity of Variance (center = median)
#       Df F value Pr(>F)
# group  1  1.7331  0.201
#       23               

ncvTest(model)
# Non-constant Variance Score Test 
# Variance formula: ~ fitted.values 
# Chisquare = 0.8209192    Df = 1     p = 0.3649116 

detach(toluca)

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility