Next, let us consider the problem in which we have a *y*-variable and *x*-variables all measured as a time series. As an example, we might have *y* as the monthly accidents on an interstate highway and *x* as the monthly amount of travel on the interstate, with measurements observed for 120 consecutive months. A multiple (time series) regression model can be written as:

\(\begin{equation} y_{t}=\textbf{X}_{t}\beta+\epsilon_{t}. \end{equation}\)

The difficulty that often arises in this context is that the errors (\(\epsilon_{t}\)) may be correlated with each other. In other words, we have **autocorrelation** or a dependency *between* the errors.

We may consider situations in which the error at one specific time is linearly related to the error at the previous time. That is, the errors themselves follow a simple linear regression model that can be written as

\(\begin{equation} \epsilon_{t}=\rho\epsilon_{t-1}+\omega_{t}. \end{equation}\)

Here, \(|\rho|<1 \) is called the **autocorrelation parameter** and the \(\omega_{t}\) term is a new error term that follows the usual assumptions that we make about regression errors: \(\omega_{t}\sim_{iid}N(0,\sigma^{2})\). (Here "iid" stands for "independent and identically distributed.) So, this model says that the error at time *t* is predictable from a fraction of the error at time *t *- 1 plus some new perturbation \(\omega_{t}\).

Our model for the \(\epsilon_{t}\) errors of the original *Y* versus **X** regression is an **autoregressive model for the errors**, specifically AR(1) in this case. One reason why the errors might have an autoregressive structure is that the *Y* and **X** variables at time *t* may be (and most likely are) related to the *Y* and **X** measurements at time *t –* 1. These relationships are being absorbed into the error term of our multiple linear regression model that only relates *Y* and **X** measurements made at concurrent times. Notice that the autoregressive model for the errors is a violation of the assumption that we have independent errors and this creates theoretical difficulties for ordinary least squares estimates of the beta coefficients. There are several different methods for estimating the regression parameters of the *Y* versus **X** relationship when we have errors with an autoregressive structure and we will introduce a few of these methods later.

The error terms \(\epsilon_{t}\) still have mean 0 and constant variance:

\(\begin{align*} \mbox{E}(\epsilon_{t})&=0 \\ \mbox{Var}(\epsilon_{t})&=\frac{\sigma^{2}}{1-\rho^{2}}. \end{align*}\)

However, the covariance (a measure of the relationship between two variables) between adjacent error terms is:

\(\begin{equation*} \mbox{Cov}(\epsilon_{t},\epsilon_{t-1})=\rho\biggl(\frac{\sigma^{2}}{1-\rho^{2}}\biggr), \end{equation*}\)

which implies the coefficient of correlation (a unitless measure of the relationship between two variables) between adjacent error terms is:

\(\begin{equation*} \mbox{Corr}(\epsilon_{t},\epsilon_{t-1})=\frac{\mbox{Cov}(\epsilon_{t},\epsilon_{t-1})}{\sqrt{\mbox{Var}(\epsilon_{t})\mbox{Var}(\epsilon_{t-1})}}=\rho, \end{equation*}\)

which is the autocorrelation parameter we introduced above.

We can use partial autocorrelation function (PACF) plots to help us assess appropriate lags for the errors in a regression model with autoregressive errors. Specifically, we first fit a multiple linear regression model to our time series data and store the residuals. Then we can look at a plot of the PACF for the residuals versus the lag. Large sample partial autocorrelations that are significantly different from 0 indicate lagged terms of \(\epsilon\) that may be useful predictors of \(\epsilon_{t}\).