# Minitab Help 10: Regression Pitfalls

### Galton peas (nonconstant variance and weighted least squares)

- Perform a linear regression analysis to fit an ordinary least squares (OLS) simple linear regression model of Progeny vs Parent (click "Storage" in the regression dialog to store fitted values).
- Select Calc > Calculator to calculate the weights variable = 1/SD
^{2}and Perform a linear regression analysis to fit a weighted least squares (WLS) model (click "Options" in the regression dialog to set the weights variable and click "Storage" to store fitted values). - Create a basic scatterplot< of the data and click Editor > Add > Calculated Line to add a regression line for each model using the stored fitted values.

### Computer-assisted learning (nonconstant variance and weighted least squares)

- Create a basic scatterplot of the data.
- Perform a linear regression analysis to fit an OLS model (click "Storage" to store the residuals and the fitted values).
- Create a basic scatterplot of the OLS residuals vs num.responses.
- Select Calc > Calculator to calculate the absolute residuals and Create a basic scatterplot of the absolute OLS residuals vs num.responses.
- Perform a linear regression analysis of absolute residuals vs num.responses (click "Storage" to store the fitted values).
- Select Calc > Calculator to calculate the weights variable = 1/(fitted values)
^{2}, Perform a linear regression analysis to fit a WLS model (click "Options" to set the weights variable and click "Storage" to store standardized residuals and fitted values). - Create a basic scatterplot< of the data and click Editor > Add > Calculated Line to add a regression line for each model using the stored fitted values.
- Create a basic scatterplot of the WLS standardized residuals vs num.responses.

### Market share (nonconstant variance and weighted least squares)

- Perform a linear regression analysis to fit an OLS model (click "Storage" to store the residuals and fitted values).
- Create a basic scatterplot of the OLS residuals vs fitted values but select "With Groups" to mark the points by Discount.
- Select Stat > Basic Statistics > Display Descriptive Statistics to calculate the residual variance for Discount=0 and Discount=1.
- Select Calc > Calculator to calculate the weights variable = 1/variance for Discount=0 and Discount=1, Perform a linear regression analysis to fit a WLS model (click "Options" to set the weights variable and click "Storage" to store standardized residuals and fitted values).
- Create a basic scatterplot of the WLS standardized residuals vs fitted values.

### Home price (nonconstant variance and weighted least squares)

- Select Calc > Calculator to calculate log transformations of the variables.
- Perform a linear regression analysis to fit an OLS model (click "Storage" to store the residuals and fitted values).
- Create a basic scatterplot of the OLS residuals vs fitted values.
- Perform a linear regression analysis of absolute residuals vs fitted values (click "Storage" to store the fitted values).
- Select Calc > Calculator to calculate the weights variable = 1/(fitted values)
^{2}, Perform a linear regression analysis to fit a WLS model (click "Options" to set the weights variable and click "Storage" to store standardized residuals and fitted values). - Create a basic scatterplot of the WLS standardized residuals vs fitted values.

### Google stock (autoregression model)

- Select Stat > Time Series > Time Series Plot, select "price" for the Series, click the Time/Scale button, click "Stamp" under "Time Scale" and select "date" to be a Stamp column.
- Select Stat > Time Series > Partial Autocorrelation to create a plot of partial autocorrelations of price.
- Select Calc > Calculator to calculate a lag-1 price variable.
- Create a basic scatterplot of price vs lag1price.
- Perform a linear regression analysis of price vs lag1price (a first-order autoregression model).

### Earthquakes (autoregression model)

- Select Stat > Time Series > Time Series Plot, select "Quakes" for the Series, click the Time/Scale button, click "Stamp" under "Time Scale" and select "Year" to be a Stamp column.
- Select Stat > Time Series > Partial Autocorrelation to create a plot of partial autocorrelations of Quakes.
- Select Calc > Calculator to calculate lag-1, lag-2, and lag-3 Quakes variables.
- Perform a linear regression analysis of Quakes vs the three lag variables (a third-order autoregression model).

### Blaisdell company (regression with autoregressive errors)

- Perform a linear regression analysis of comsales vs indsales (click "Results" to select the Durbin-Watson statistic and click "Storage" to store the residuals).
- Select Stat > Time Series > Autocorrelation and select the residuals; this displays the autocorrelation function and the Ljung-Box Q test statistic.
- Perform the Cochrane-Orcutt procedure:
- Select Calc > Calculator to calculate a lag-1 residual variable.
- Perform a linear regression analysis
**with no intercept**of residuals vs lag-1 residuals (select "Storage" to store the estimated coefficients; the estimated slope, 0.631164, is the estimate of the autocorrelation parameter). - Select Calc > Calculator to calculate a transformed response variable, Y_co = comsales-0.631164*LAG(comsales,1).
- Select Calc > Calculator to calculate a transformed predictor variable, X_co = indsales-0.631164*LAG(indsales,1).
- Perform a linear regression analysis of Y_co vs X_co.
- Transform the resulting intercept parameter and its standard error by dividing by 1 – 0.631164 (the slope parameter and its standard error do not need transforming).

- Forecast comsales for period 21 when indsales are projected to be $175.3 million.
- Perform the Hildreth-Lu procedure:
- Select Calc > Calculator to calculate a transformed response variable, Y_h1.1 = comsales-0.1*LAG(comsales,1).
- Select Calc > Calculator to calculate a transformed predictor variable, X_h1.1 = indsales-0.1*LAG(indsales,1).
- Perform a linear regression analysis of Y_h1.1 vs X_h1.1 and record the SSE.
- Repeat steps 1-3 for a series of estimates of the autocorrelation parameter to find when SSE is minimized (0.96 leads to the minimum in this case).
- Perform a linear regression analysis of Y_h1.96 vs X_h1.96.
- Transform the resulting intercept parameter and its standard error by dividing by 1 – 0.96 (the slope parameter and its standard error do not need transforming).

- Perform the first differences procedure:
- Select Calc > Calculator to calculate a transformed response variable, Y_fd = comsales-LAG(comsales,1).
- Select Calc > Calculator to calculate a transformed predictor variable, X_fd = indsales-LAG(indsales,1).
- Perform a linear regression analysis
**with no intercept**of Y_fd vs X_fd. - Calculate the intercept parameter as mean(comsales) – slope estimate x mean(indsales).

### Metal fabricator and vendor employees (regression with autoregressive errors)

- Perform a linear regression analysis of metal vs vendor (click "Results" to select the Durbin-Watson statistic and click "Storage" to store the residuals).
- Create a fitted line plot.
- Create residual plots and select "Residuals versus order."
- Select Stat > Time Series > Partial Autocorrelation and select the residuals.
- Perform the Cochrane-Orcutt procedure using the above directions for the Blaisdell company example.

### Blood pressure (multicollinearity)

- Create a simple matrix of scatterplots of the data.
- Obtain a sample correlation between the variables.

### Uncorrelated predictors (no multicollinearity)

- Create a simple matrix of scatterplots of the data.
- Obtain a sample correlation between the predictors.
- Perform a linear regression analysis of y vs x
_{1}. - Perform a linear regression analysis of y vs x
_{2}. - Perform a linear regression analysis of y vs x
_{1}+ x_{2}. - Perform a linear regression analysis of y vs x
_{2}+ x_{1}. - Select Graph > 3D Scatterplot to create a 3D scatterplot of the data.

### Blood pressure (predictors with almost no multicollinearity)

- Create a simple matrix of scatterplots of the data.
- Perform a linear regression analysis of BP vs Stress.
- Perform a linear regression analysis of BP vs BSA.
- Perform a linear regression analysis of BP vs Stress + BSA.
- Perform a linear regression analysis of BP vs BSA + Stress.
- Select Graph > 3D Scatterplot to create a 3D scatterplot of the data.

### Blood pressure (predictors with high multicollinearity)

- Create a simple matrix of scatterplots of the data.
- Perform a linear regression analysis of BP vs Weight.
- Perform a linear regression analysis of BP vs BSA.
- Perform a linear regression analysis of BP vs Weight + BSA.
- Perform a linear regression analysis of BP vs BSA + Weight.
- Select Graph > 3D Scatterplot to create a 3D scatterplot of the data.
- Find a confidence interval and a prediction interval for the response to predict BP for Weight=92 and BSA=2 for the two simple linear regression models and the multiple linear regression model.

### Poverty and teen birth rate (high multicollinearity)

- Select Data > Subset Worksheet to create a worksheet that excludes the District of Columbia.
- Create a simple matrix of scatterplots of the data.
- Perform a linear regression analysis of PovPct vs Brth15to17.
- Perform a linear regression analysis of PovPct vs Brth18to19.
- Perform a linear regression analysis of PovPct vs Brth15to17 + Brth18to19.

### Blood pressure (high multicollinearity)

- Perform a linear regression analysis of BP vs Age + Weight + BSA + Dur + Pulse + Stress.
- Perform a linear regression analysis of Weight vs Age + BSA + Dur + Pulse + Stress and confirm the VIF value for Weight as 1/(1-R
^{2}) for this model. - Perform a linear regression analysis of BP vs Age + Weight + Dur + Stress.

### Allen Cognitive Level study (reducing data-based multicollinearity)

- Create a simple matrix of scatterplots of the sampled allentestn23 data.
- Obtain a sample correlation between Vocab and Abstract.
- Perform a linear regression analysis of ACL vs SDMT + Vocab + Abstract.
- Repeat for the full allentest data.

### Exercise and immunity (reducing structural multicollinearity)

- Create a basic scatterplot of igg vs oxygen.
- Select Calc > Calculator to calculate an oxygen-squared variable named oxygensq.
- Perform a linear regression analysis of igg vs oxygen + oxygensq.
- Create a fitted line plot and select "Quadratic" for the type of regression model.
- Create a basic scatterplot of oxygensq vs oxygen.
- Obtain a sample correlation between oxygensq and oxygen.
- Select Calc > Calculator to calculate a centered oxygen variable named oxcent and an oxcent-squared variable named oxcentsq.
- Perform a linear regression analysis of igg vs oxcent + oxcentsq.
- Create a fitted line plot and select "Quadratic" for the type of regression model.
- Perform a linear regression analysis of igg vs oxcent.
- Create residual plots to create a residual vs fits plot and a normal probability plot for the centered quadratic model.
- Find a confidence interval and a prediction interval for the response to predict igg for oxygen = 70 using the centered quadratic model.