Minitab Help 10: Regression Pitfalls

Galton peas (nonconstant variance and weighted least squares)

  • Perform a linear regression analysis to fit an ordinary least squares (OLS) simple linear regression model of Progeny vs Parent (click "Storage" in the regression dialog to store fitted values).
  • Select Calc > Calculator to calculate the weights variable = 1/SD2 and Perform a linear regression analysis to fit a weighted least squares (WLS) model (click "Options" in the regression dialog to set the weights variable and click "Storage" to store fitted values).
  • Create a basic scatterplot< of the data and click Editor > Add > Calculated Line to add a regression line for each model using the stored fitted values.

Computer-assisted learning (nonconstant variance and weighted least squares)

Market share (nonconstant variance and weighted least squares)

  • Perform a linear regression analysis to fit an OLS model (click "Storage" to store the residuals and fitted values).
  • Create a basic scatterplot of the OLS residuals vs fitted values but select "With Groups" to mark the points by Discount.
  • Select Stat > Basic Statistics > Display Descriptive Statistics to calculate the residual variance for Discount=0 and Discount=1.
  • Select Calc > Calculator to calculate the weights variable = 1/variance for Discount=0 and Discount=1, Perform a linear regression analysis to fit a WLS model (click "Options" to set the weights variable and click "Storage" to store standardized residuals and fitted values).
  • Create a basic scatterplot of the WLS standardized residuals vs fitted values.

Home price (nonconstant variance and weighted least squares)

Google stock (autoregression model)

  • Select Stat > Time Series > Time Series Plot, select "price" for the Series, click the Time/Scale button, click "Stamp" under "Time Scale" and select "date" to be a Stamp column.
  • Select Stat > Time Series > Partial Autocorrelation to create a plot of partial autocorrelations of price.
  • Select Calc > Calculator to calculate a lag-1 price variable.
  • Create a basic scatterplot of price vs lag1price.
  • Perform a linear regression analysis of price vs lag1price (a first-order autoregression model).

Earthquakes (autoregression model)

  • Select Stat > Time Series > Time Series Plot, select "Quakes" for the Series, click the Time/Scale button, click "Stamp" under "Time Scale" and select "Year" to be a Stamp column.
  • Select Stat > Time Series > Partial Autocorrelation to create a plot of partial autocorrelations of Quakes.
  • Select Calc > Calculator to calculate lag-1, lag-2, and lag-3 Quakes variables.
  • Perform a linear regression analysis of Quakes vs the three lag variables (a third-order autoregression model).

Blaisdell company (regression with autoregressive errors)

  • Perform a linear regression analysis of comsales vs indsales (click "Results" to select the Durbin-Watson statistic and click "Storage" to store the residuals).
  • Select Stat > Time Series > Autocorrelation and select the residuals; this displays the autocorrelation function and the Ljung-Box Q test statistic.
  • Perform the Cochrane-Orcutt procedure:
    • Select Calc > Calculator to calculate a lag-1 residual variable.
    • Perform a linear regression analysis with no intercept of residuals vs lag-1 residuals (select "Storage" to store the estimated coefficients; the estimated slope, 0.631164, is the estimate of the autocorrelation parameter).
    • Select Calc > Calculator to calculate a transformed response variable, Y_co = comsales-0.631164*LAG(comsales,1).
    • Select Calc > Calculator to calculate a transformed predictor variable, X_co = indsales-0.631164*LAG(indsales,1).
    • Perform a linear regression analysis of Y_co vs X_co.
    • Transform the resulting intercept parameter and its standard error by dividing by 1 – 0.631164 (the slope parameter and its standard error do not need transforming).
  • Forecast comsales for period 21 when indsales are projected to be $175.3 million.
  • Perform the Hildreth-Lu procedure:
    • Select Calc > Calculator to calculate a transformed response variable, Y_h1.1 = comsales-0.1*LAG(comsales,1).
    • Select Calc > Calculator to calculate a transformed predictor variable, X_h1.1 = indsales-0.1*LAG(indsales,1).
    • Perform a linear regression analysis of Y_h1.1 vs X_h1.1 and record the SSE.
    • Repeat steps 1-3 for a series of estimates of the autocorrelation parameter to find when SSE is minimized (0.96 leads to the minimum in this case).
    • Perform a linear regression analysis of Y_h1.96 vs X_h1.96.
    • Transform the resulting intercept parameter and its standard error by dividing by 1 – 0.96 (the slope parameter and its standard error do not need transforming).
  • Perform the first differences procedure:
    • Select Calc > Calculator to calculate a transformed response variable, Y_fd = comsales-LAG(comsales,1).
    • Select Calc > Calculator to calculate a transformed predictor variable, X_fd = indsales-LAG(indsales,1).
    • Perform a linear regression analysis with no intercept of Y_fd vs X_fd.
    • Calculate the intercept parameter as mean(comsales) – slope estimate x mean(indsales).

Metal fabricator and vendor employees (regression with autoregressive errors)

  • Perform a linear regression analysis of metal vs vendor (click "Results" to select the Durbin-Watson statistic and click "Storage" to store the residuals).
  • Create a fitted line plot.
  • Create residual plots and select "Residuals versus order."
  • Select Stat > Time Series > Partial Autocorrelation and select the residuals.
  • Perform the Cochrane-Orcutt procedure using the above directions for the Blaisdell company example.

Blood pressure (multicollinearity)

Uncorrelated predictors (no multicollinearity)

Blood pressure (predictors with almost no multicollinearity)

Blood pressure (predictors with high multicollinearity)

Poverty and teen birth rate (high multicollinearity)

Blood pressure (high multicollinearity)

Allen Cognitive Level study (reducing data-based multicollinearity)

Exercise and immunity (reducing structural multicollinearity)