T.2.3 - Testing and Remedial Measures for Autocorrelation

Here we present some formal tests and remedial measures for dealing with error autocorrelation.

Durbin-Watson Test

We usually assume that the error terms are independent unless there is a specific reason to think that this is not the case. Usually, violation of this assumption occurs because there is a known temporal component for how the observations were drawn. The easiest way to assess if there is dependency is by producing a scatterplot of the residuals versus the time measurement for that observation (assuming you have the data arranged according to a time sequence order). If the data are independent, then the residuals should look randomly scattered about 0. However, if a noticeable pattern emerges (particularly one that is cyclical) then dependency is likely an issue.

Recall that if we have a first-order autocorrelation with the errors, then the errors are modeled as:

\(\begin{equation*} \epsilon_{t}=\rho\epsilon _{t-1}+\omega_{t}, \end{equation*}\)

where \(|\rho|<1\) and the \(\omega_{t}\sim_{iid}N(0,\sigma^{2})\). If we suspect first-order autocorrelation with the errors, then a formal test does exist regarding the parameter \(\rho\). In particular, the Durbin-Watson test is constructed as:

\(\begin{align*} \nonumber H_{0}&\colon \rho=0 \\ \nonumber H_{A}&\colon \rho\neq 0. \end{align*}\)

So the null hypothesis of \(\rho=0\) means that \(\epsilon_{t}=\omega_{t}\), or that the error term in one period is not correlated with the error term in the previous period, while the alternative hypothesis of \(\rho\neq 0\) means the error term in one period is either positively or negatively correlated with the error term in the previous period. Oftentimes, a researcher will already have an indication of whether the errors are positively or negatively correlated. For example, a regression of oil prices (in dollars per barrel) versus the gas price index will surely have positively correlated errors. When the researcher has an indication of the direction of the correlation, then the Durbin-Watson test also accommodates the one-sided alternatives \(H_{A}\colon\rho< 0\) for negative correlations or \(H_{A}\colon\rho> 0\) for positive correlations (as in the oil example).

The test statistic for the Durbin-Watson test on a data set of size n is given by:

\(\begin{equation*} D=\dfrac{\sum_{t=2}^{n}(e_{t}-e_{t-1})^{2}}{\sum_{t=1}^{n}e_{t}^{2}}, \end{equation*}\)

where \(e_{t}=y_{t}-\hat{y}_{t}\) are the residuals from the ordinary least squares fit. The DW test statistic varies from 0 to 4, with values between 0 and 2 indicating positive autocorrelation, 2 indicating zero autocorrelation, and values between 2 and 4 indicating negative autocorrelation. Exact critical values are difficult to obtain, but tables (for certain significance values) can be used to make a decision (e.g., see the tables on the Durbin Watson Significance Tables, where N represents the sample size, n, and \(\Lambda\) represents the number of regression parameters, p). The tables provide a lower and upper bound, called \(d_{L}\) and \(d_{U}\), respectively. In testing for positive autocorrelation, if \(D<d_{L}\) then reject \(H_{0}\), if \(D>d_{U}\) then fail to reject \(H_{0}\), or if \(d_{L}\leq D\leq d_{U}\), then the test is inconclusive. While the prospect of having an inconclusive test result is less than desirable, there are some programs that use exact and approximate procedures for calculating a p-value. These procedures require certain assumptions about the data which we will not discuss. One "exact" method is based on the beta distribution for obtaining p-values.

To illustrate, consider the Blaisdell Company example from page 489 of Applied Linear Regression Models (4th ed) by Kutner, Nachtsheim, and Neter. If we fit a simple linear regression model with response comsales (company sales in $ millions) and predictor indsales (industry sales in $ millions) and click the "Results" button in the Regression Dialog and check "Durbin-Watson statistic" we obtain the following output:

Coefficients

Term Coef SE Coef T-Value P-Value VIF
Constant -1.455 0.214 -6.79 0.000  
indsales 0.17628 0.00144 122.02 0.000 1.00

Durbin-Watson Statistic = 0.734726

Since the value of the Durbin-Watson Statistic falls below the lower bound at a 0.01 significance level (obtained from a table of Durbin-Watson test bounds), there is strong evidence the error terms are positively correlated.

Ljung-Box Q Test Section

The Ljung-Box Q test (sometimes called the Portmanteau test) is used to test whether or not observations over time are random and independent. In particular, for a given k, it tests the following:

\(\begin{align*} \nonumber H_{0}&\colon \textrm{the autocorrelations up to lag} \ k \ \textrm{are all 0} \\ \nonumber H_{A}&\colon \textrm{the autocorrelations of one or more lags differ from 0}. \end{align*}\)

The test statistic is calculated as:

\(\begin{equation*} Q_{k}=n(n+2)\sum_{j=1}^{k}\dfrac{{r}^{2}_{j}}{n-j}, \end{equation*}\)

which is approximately \(\chi^{2}_{k}\)-distributed.

To illustrate how the test works for k=1, consider the Blaisdell Company example from above. If we store the residuals from a simple linear regression model with response comsales and predictor indsales and then find the autocorrelation function for the residuals (select Stat > Time Series > Autocorrelation), we obtain the following output:

Autocorrelation Function: RESI1

Lag ACF T LBQ
1 0.624005 2.80 9.08

The Ljung-Box Q test statistic of 9.08 corresponds to a \(\chi^{2}_{1}\) p-value of 0.0026, so there is strong evidence the lag-1 autocorrelation is non-zero.

Remedial Measures Section

When autocorrelated error terms are found to be present, then one of the first remedial measures should be to investigate the omission of a key predictor variable. If such a predictor does not aid in reducing/eliminating autocorrelation of the error terms, then certain transformations on the variables can be performed. We discuss three transformations that are designed for AR(1) errors. Methods for dealing with errors from an AR(k) process do exist in the literature but are much more technical in nature.

Cochrane-Orcutt Procedure Section

The first of the three transformation methods we discuss is called the Cochrane-Orcutt procedure, which involves an iterative process (after identifying the need for an AR(1) process):

Estimate \(\rho\) for \(\begin{equation*} \epsilon_{t}=\rho\epsilon_{t-1}+\omega_{t} \end{equation*}\) by performing a regression through the origin. Call this estimate r.

Transform the variables from the multiple regression model \(\begin{equation*} y_{t}=\beta_{0}+\beta_{1}x_{t,1}+\ldots+\beta_{p-1}x_{t,p-1}+\epsilon_{t} \end{equation*}\) by setting \(y_{t}^{*}=y_{t}-ry_{t-1}\) and \(x_{t,j}^{*}=x_{t,j}-rx_{t-1,j}\) for \(j=1,\ldots,p-1\).

Regress \(y_{t}^{*}\) on the transformed predictors using ordinary least squares to obtain estimates \(\hat{\beta}_{0}^{*},\ldots,\hat{\beta}_{p-1}^{*}\). Look at the error terms for this fit and determine if autocorrelation is still present (such as using the Durbin-Watson test). If autocorrelation is still present, then iterate this procedure. If it appears to be corrected, then transform the estimates back to their original scale by setting \(\hat{\beta}_{0}=\hat{\beta}_{0}^{*}/(1-r)\) and \(\hat{\beta}_{j}=\hat{\beta}_{j}^{*}\) for \(j=1,\ldots,p-1\). Notice that only the intercept parameter requires a transformation. Furthermore, the standard errors of the regression estimates for the original scale can also be obtained by setting \(\textrm{s.e.}(\hat{\beta}_{0})=\textrm{s.e.}(\hat{\beta}_{0}^{*})/(1-r)\) and \(\textrm{s.e.}(\hat{\beta}_{j})=\textrm{s.e.}(\hat{\beta}_{j}^{*})\) for \(j=1,\ldots,p-1\).

To illustrate the Cochrane-Orcutt procedure, consider the Blaisdell Company example from above:

  1. Store the residuals, RESI1, from a simple linear regression model with response comsales and predictor indsales.
  2. Use Minitab's Calculator to define a lagged residual variable, lagRESI1 = LAG(RESI1,1).
  3. Fit a simple linear regression model with response RESI1 and predictor lagRESI1 and no intercept. Use the Storage button to store the Coefficients. We find the estimated slope from this regression to be 0.631164, which is the estimate of the autocorrelation parameter, \(\rho\).
  4. Use Minitab's Calculator to define a transformed response variable, Y_co = comsales-0.631164*LAG(comsales,1).
  5. Use Minitab's Calculator to define a transformed predictor variable, X_co = indsales-0.631164*LAG(indsales,1).
  6. Fit a simple linear regression model with response Y_co and predictor X_co to obtain the following output:

Coefficients

Term Coef SE Coef T-Value P-Value VIF
Constant -0.394 0.167 -2.36 0.031  
X_co 0.17376 0.00296 58.77 0.000 1.00

Durbin-Watson Statistic = 1.65025

  • Since the value of the Durbin-Watson Statistic falls above the upper bound at a 0.01 significance level (obtained from a table of Durbin-Watson test bounds), there is no evidence the error terms are positively correlated in the model with the transformed variables.
  • Transform the intercept parameter, -0.394/(1-0.631164) = -1.068, and its standard error, 0.167/(1-0.631164) = 0.453 (the slope estimate and standard error don't require transformation).
  • The fitted regression function for the original variables is predicted comsales = -1.068 + 0.17376 indsales.

One thing to note about the Cochrane-Orcutt approach is that it does not always work properly. This occurs primarily because if the errors are positively autocorrelated, then r tends to underestimate \(\rho\). When this bias is serious, then it can seriously reduce the effectiveness of the Cochrane-Orcutt procedure.

Hildreth-Lu Procedure Section

The Hildreth-Lu procedure is a more direct method for estimating \(\rho\). After establishing that the errors have an AR(1) structure, follow these steps:

  1. Select a series of candidate values for \(\rho\) (presumably values that would make sense after you assessed the pattern of the errors).
  2. For each candidate value, regress \(y_{t}^{*}\) on the transformed predictors using the transformations established in the Cochrane-Orcutt procedure. Retain the SSEs for each of these regressions.
  3. Select the value which minimizes the SSE as an estimate of \(\rho\).

Notice that this procedure is similar to the Box-Cox transformation discussed previously and that it is not iterative like the Cochrane-Orcutt procedure.

To illustrate the Hildreth-Lu procedure, consider the Blaisdell Company example from above:

  1. Use Minitab's Calculator to define a transformed response variable, Y_hl.1 = comsales-0.1*LAG(comsales,1).
  2. Use Minitab's Calculator to define a transformed predictor variable, X_hl.1 = indsales-0.1*LAG(indsales,1).
  3. Fit a simple linear regression model with response Y_hl.1 and predictor X_hl.1 and record the SSE.
  4. Repeat steps 1-3 for a series of estimates of \(\rho\) to find when SSE is minimized (0.96 leads to the minimum in this case).
  5. The output for this model is:

Analysis of Variances

Source DF Adj SS Adj Ms F-Value P-Value
Regression 1 2.31988 2.31988 550.26 0.000
X_h1.96 1 2.31988 2.31988 550.26 0.000
Error 17 0.07167 0.00422    
Total 18 2.39155      

Coefficients

Term Coef SE Coef T-Value P-Value VIF
Constant 0.0712 0.0580 1.23 0.236  
X_h1.96 0.16045 0.00684 23.46 0.000 1.00

Durbin-Watson Statistic = 1.72544

  • Since the value of the Durbin-Watson Statistic falls above the upper bound at a 0.01 significance level (obtained from a table of Durbin-Watson test bounds), there is no evidence the error terms are positively correlated in the model with the transformed variables.
  • Transform the intercept parameter, 0.0712/(1-0.96) = 1.78, and its standard error, 0.0580/(1-0.96) = 1.45 (the slope estimate and standard error don't require transforming).
  • The fitted regression function for the original variables is predicted comsales = 1.78 + 0.16045 indsales.

First Differences Procedure Section

Since \(\rho\) is frequently large for AR(1) errors (especially in economics data), many have suggested just setting \(\rho\) = 1 in the transformed model of the previous two procedures. This procedure is called the first differences procedure and simply regresses \(y_{t}^{*}=y_{t}-y_{t-1}\) on the \(x_{t,j}^{*}=x_{t,j}-x_{t-1,j}\) for \(j=1,\ldots,p-1\) using regression through the origin. The estimates from this regression are then transformed back, setting \(\hat{\beta}_{j}=\hat{\beta}_{j}^{*}\) for \(j=1,\ldots,p-1 \) and \(\hat{\beta}_{0}=\bar{y}-(\hat{\beta}_{1}\bar{x}_{1}+\ldots+\hat{\beta}_{p-1}\bar{x}_{p-1})\).

To illustrate the first differences procedure, consider the Blaisdell Company example from above:

  1. Use Minitab's Calculator to define a transformed response variable, Y_fd = comsales-LAG(comsales,1).
  2. Use Minitab's Calculator to define a transformed predictor variable, X_fd = indsales-LAG(indsales,1).
  3. Fit a simple linear regression model with response Y_fd and predictor X_fd and use the "Results" button to select the Durbin-Watson statistic:  Durbin-Watson Statistic = 1.74883
  4. Since the value of the Durbin-Watson Statistic falls above the upper bound at a 0.01 significance level (obtained from a table of Durbin-Watson test bounds), there is no evidence the error terms are correlated in the model with the transformed variables.
  5. Fit a simple linear regression model with response Y_fd and predictor X_fd and no intercept. The output for this model is:

Coefficients

Term Coef SE Coef T-Value P-Value VIF
X_fd 0.16849 0.00510 33.06 0.000 1.00
  • Find the sample mean of comsales and indsales using Stat > Basic Statistics > Display Descriptive Statistics:
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
comsales 20 0 24.569 0.539 2.410 20.960 22.483 24.200 26.825 28.970
indsales 20 0 147.62 3.06 13.67 127.30 135.53 145.95 159.85 171.70
  • Calculate the estimated intercept parameter, \(\hat \beta_0 = 24.569 - 0.16849(147.62) = - 0.303\).
  • The fitted regression function for the original variables is predicted comsales = -0.303 + 0.16849 indsales.

Forecasting Issues Section

When calculating forecasts for regression with autoregressive errors, it is important to utilize the modeled error structure as part of our process. For example, for AR(1) errors, \(\epsilon_{t}=\rho\epsilon_{t-1}+\omega_{t}\), our fitted regression equation is

\(\begin{equation*} \hat{y}_{t}=b_{0}+b_{1}x_{t}, \end{equation*}\)

with forecasts of \(y\) at time \(t\), denoted \(F_{t}\), computed as:

\(\begin{equation*} F_{t}=\hat{y}_{t}+e_{t}=\hat{y}_{t}+re_{t-1}. \end{equation*}\)

So, we can compute forecasts of \(y\) at time \(t+1\), denoted \(F_{t+1}\), iteratively:

  • Compute the fitted value for time period t, \(\hat{y}_{t}=b_{0}+b_{1}x_{t}\)
  • Compute the residual for time period t, \(e_{t}=y_{t}-\hat{y}_{t}\).
  • Compute the fitted value for time period t+1, \(\hat{y}_{t+1}=b_{0}+b_{1}x_{t+1}\).
  • Compute the forecast for time period t+1, \(F_{t+1}=\hat{y}_{t+1}+re_{t}\).
  • Iterate.

To illustrate forecasting for the Cochrane-Orcutt procedure, consider the Blaisdell Company example from above. Suppose we wish to forecast comsales for time period 21 when indsales are projected to be $175.3 million:

  • The fitted value for time period 20 is \(\hat{y}_{20} = -1.068+0.17376(171.7)) = 28.767\).
  • The residual for time period 20 is \(e_{20} = y_{20}-\hat{y}_{20} = 28.78 - 28.767 = 0.013\).
  • The fitted value for time period 21, \(\hat{y}_{21} = -1.068+0.17376(175.3)) = 29.392\).
  • The forecast for time period 21 is \(F_{21}=29.392+0.631164(0.013)=29.40\).

Note that this procedure is not needed when simply using the lagged response variable as a predictor, e.g., in the first-order autoregression model

\(\begin{equation*} y_{t}=\beta_{0}+\beta_{1}y_{t-1}+\epsilon_{t} \end{equation*}.\)