3.3 Forecasting with ARIMA Models

Section 3.4 in the textbook gives a theoretical look at forecasting with ARIMA models. That presentation is a bit tough, but in practice, it’s easy to understand how forecasts are created.

In an ARIMA model, we express \(x_t\) as a function of past value(s) of x and/or past errors (as well as a present time error). When we forecast a value past the end of the series, we might need values from the observed series on the right side of the equation or we might, in theory, need values that aren’t yet observed.

Example 3-4 Section

Consider the AR(2) model \(x_t=\delta+\phi_1x_{t-1}+\phi_2x_{t-2}+w_t\). In this model, \(x_{t}\) is a linear function of the values of x at the previous two times. Suppose that we have observed n data values and wish to use the observed data and estimated AR(2) model to forecast the value of \(x_{n+1}\) and \(x_{n+2}\), the values of the series at the next two times past the end of the series. The equations for these two values are

\(x_{n+1}=\delta+\phi_1x_n+\phi_2x_{n-1}+w_{n+1}\)

\(x_{n+2}=\delta+\phi_1x_{n+1}+\phi_2x_n+w_{n+2}\)

To use the first of these equations, we simply use the observed values of \(x_{n }\)and \(x_{n-1 }\)and replace \(w_{n+1}\) by its expected value of 0 (the assumed mean for the errors).

The second equation for forecasting the value at time n + 2 presents a problem. It requires the unobserved value of \(x_{n+1}\) (one time past the end of the series). The solution is to use the forecasted value of \(x_{n+1}\) (the result of the first equation).

In general, the forecasting procedure, assuming a sample size of n, is as follows:

  • For any \(w_{j}\) with 1 ≤ j ≤ n, use the sample residual for time point j
  • For any \(w_{j}\) with j > n, use 0 as the value of \(w_{j}\)
  • For any \(x_{j}\) with 1 ≤ j ≤ n, use the observed value of \(x_{j}\)
  • For any \(x_{j }\)with j > n, use the forecasted value of \(x_{j}\)

Notation Section

Our authors use the notation \(x^n_{n+m}\) to represent a forecast m times past the end of the observed series. The “superscript” is to be read as “given data up to time n.” Other authors use the notation \(x_{n} \left(m \right)\) to denote a forecast m times past time n.

To understand the formula for the standard error of the forecast error, we first need to define the concept of psi-weights.

Psi-weight representation of an ARIMA model Section

Any ARIMA model can be converted to an infinite order MA model:

\begin{align} x_t - \mu & =  w_t + \Psi_1w_{t-1} + \Psi_2w_{t-2} + \dots + \Psi_kw_{t-k} + \dots \\ & =  \sum_{j=0}^{\infty}\Psi_jw_{t-j} \text{ where} \Psi_0 \equiv 1 \end{align}

An important constraint so that the model doesn’t “explode” as we go back in time is

\(\sum_{j=0}^{\infty}|\Psi_j| < \infty . \)

On page 95 of our book, the authors define a “causal” model as one for which this constraint is in place, along with the additional restraint that we can’t express the value of the present x as a function of future values.

The process of finding the “psi-weight” representation can involve a few algebraic tricks. Fortunately, R has a routine, ARMAtoMA, that will do it for us. To illustrate how psi-weights may be determined algebraically, we’ll consider a simple example.

Example 3-5 Section

Suppose that an AR(1) model is \(x _ { t } = 40 + 0.6 x _ { t - 1 } + w _ { t }\)

For an AR(1) model, the mean \(\mu=\delta/(1-\phi_1)\) so in this case, \(\mu = 40/(1 - .6) = 100\). We’ll define \(z _ { t } = x _ { t } - 100\) and rewrite the model as \(z _ { t } = 0.6 z _ { t - 1 } + w _ { t }\). (You can do the algebra to check that things match between the two expressions of the model.)

To find the psi-weight expression, we’ll continually substitute for the z on the right side in order to make the expression become one that only involves w values.

\(z _ { t } = 0.6 z _ { t - 1 } + w _ { t } , \text { so } z _ { t - 1 } = 0.6 z _ { t - 2 } + w _ { t - 1 }\)

Substitute the right side of the second expression for \(z_{t - 1}\) in the first expression.

This gives

\(z _ { t } = 0.6 \left( 0.6 z _ { t - 2 } + w _ { t - 1 } \right) + w _ { t } = 0.36 z _ { t - 2 } + 0.6 w _ { t - 1 } + w _ { t }\).

Next, note that \(z _ { t - 2 } = 0.6 z _ { t - 3 } + w _ { t - 2 }\).

Substituting this into the equation gives

\(z _ { t } = 0.216 z _ { t - 3 } + 0.36 w _ { t - 2 } + 0.6 w _ { t - 1 } + w _ { t }\).

If you keep going, you’ll soon see that the pattern leads to

\(z_t = x_t -100 = \sum_{j=0}^{\infty}(0.6)^jw_{t-j}\)

Thus the psi-weights for this model are given by \(\psi_j=(0.6)^j\) for \(j=0,1,\ldots,\infty\).

In R, the command ARMAtoMA(ar = .6, ma=0, 12) gives the first 12 psi-weights. This will give the psi-weights \(\psi_1\) to \(\psi_{12}\) in scientific notation.

For the AR(1) with AR coefficient = 0.6 they are:

[1] 0.600000000 0.360000000 0.216000000 0.129600000 0.077760000 0.046656000

[7] 0.027993600 0.016796160 0.010077696 0.006046618 0.003627971 0.002176782

Remember that \(\psi_0 \equiv 1\). R doesn’t give this value. It’s listing starts with \(\psi_1\), which equals 0.6 in this case.

MA Models: The psi-weights are easy for an MA model because the model already is written in terms of the errors. The psi-weights = 0 for lags past the order of the MA model and equal the coefficient values for lags of the errors that are in the model. Remember that we always have \(\psi_0=1\).

Standard error of the forecast error for a forecast using an ARIMA model Section

Without proof, we’ll state a result:

The variance of the difference between the forecasted value at time n + m and the (unobserved) value at time n + m is

Variance of \((x^{n}_{n+m} - x_{n+m}) = \sigma^2_w \sum_{j=0}^{m-1}\Psi^2_j. \)

Thus the estimated standard deviation of the forecast error at time n + m is

Standard error of \((x^n_{n+m}-x_{n+m}) = \sqrt{\widehat{\sigma}_w^2\sum_{j=0}^{m-1}\Psi^2_j}.\)

Note!
The summation of squared psi-weights begins with \((\Psi_0)^2=1\) and that the summation goes to m – 1, one less than the number of times ahead for which we’re forecasting.

When forecasting m = 1 time past the end of the series, the standard error of the forecast error is

Standard error of \((x^n_{n+1}-x_{n+1}) = \sqrt{\widehat{\sigma}_w^2(1)}\)

When forecasting the value m = 2 times past the end of the series, the standard error of the forecast error is

Standard error of \((x^n_{n+2}-x_{n+2}) = \sqrt{\widehat{\sigma}_w^2(1+\Psi^2_1)}.\)

Notice that the variance will not be too big when m = 1. But, as you predict out farther in the future, the variance will increase. When m is very large, we will get the total variance. In other words, if you are trying to predict very far out, we will get the variance of the entire time series; as if you haven't even looked at what was going on previously.

95% Prediction Interval for \(x_{n+m}\)

With the assumption of normally distributed errors, a 95% prediction interval for \(x_{n+m}\), the future value of the series at time n + m, is

\(x^n_{n+m} \pm 1.96 \sqrt{\widehat{\sigma}_w^2\sum_{j=0}^{m-1}\Psi^2_j}.\)

Example 3-6 Section

Suppose that an AR(1) model is estimated to be \(x _ { t } = 40 + 0.6 x _ { t - 1 } + w _ { t }\). This is the same model used earlier in this handout, so the psi-weights we got there apply.

Suppose that we have n = 100 observations, \(\widehat{\sigma}_w^2 = 4\) and \(x_{100} = 80\). We wish to forecast the values at times 101 and 102, and create prediction intervals for both forecasts.

First we forecast time 101.

\(\begin{array}{lll}x_{101} & = & 40 + 0.6x_{100} + w_{101} \\ x^{100}_{101} & = & 40 +0.6 (80) + 0  =  88  \end{array}\)

The standard error of the forecast error at time 101 is

\(\sqrt{\widehat{\sigma}_w^2 \sum_{j=0}^{1-1}\psi^2_j} = \sqrt{4(1)} = 2.\)

The 95% prediction interval for the value at time 101 is 88 ± 2(1.96), which is 84.08 to 91.96. We are therefore 95% confident that the observation at time 101 will be between 84.08 and 91.96. If we repeated this exact process many times, then 95% of the computed prediction intervals would contain the true value of x at time 101.

The forecast for time 102 is

\(x^{100}_{102} = 40 + 0.6(88) + 0 = 92.8\)

Note!
We used the forecasted value for time 101 in the AR(1) equation.

The relevant standard error is

\(\sqrt{\widehat{\sigma}_w^2 \sum_{j=0}^{2-1}\psi^2_j} = \sqrt{4(1+0.6^2)} = 2.332\)

A 95% prediction interval for the value at time 102 is 92.8 ± (1.96)(2.332).

To forecast using an ARIMA model in R, we recommend our textbook author’s script called sarima.for. (It is part of the astsa library recommended previously.)

Example 3-7 Section

In the homework for Lesson 2, problem 5 asked you to suggest a model for a time series of stride lengths measured every 30 seconds for a runner on a treadmill. 

From R, the estimated coefficients for an AR(2) model and the estimated variance are as follows for a similar data set with n = 90 observations:

Coefficients:

  ar1 ar2 xmean
  1.1480 -0.3359 48.7476
s.e. 0.1009 0.1087 1.8855

sigma^2 estimated as 11.47

The command

sarima.for(stridelength, 6, 2, 0, 0) # 6 forecasts with an AR(2) model for stridelength 

will give forecasts and standard errors of prediction errors for the next six times past the end of the series. Here’s the output (slightly edited to fit here):

\$pred
Time Series:
Start = 91
End = 96
[1] 69.78674 64.75441 60.05661 56.35385 53.68102 51.85633

\$se
Time Series:
Start = 91
End = 96
[1] 3.386615 5.155988 6.135493 6.629810 6.861170 6.962654

The forecasts are given in the first batch of values under \$pred and the standard errors of the forecast errors are given in the last line in the batch of results under \$se.

The procedure also gave this graph, which shows the series followed by the forecasts as a red line and the upper and lower prediction limits as blue dashed lines:

graph

Psi-Weights for the Estimated AR(2) for the Stride Length Data Section

If we wanted to verify the standard error calculations for the six forecasts past the end of the series, we would need to know the psi-weights. To get them, we need to supply the estimated AR coefficients for the AR(2) model to the ARMAtoMA command.

The R command in this case is ARMAtoMA(ar = list(1.148, -0.3359), ma = 0, 5)

This will give the psi-weights \(\psi_1\) to \(\psi_5\) in scientific notation. The answer provided by R is:

[1] 1.148000e+00 9.820040e-01 7.417274e-01 5.216479e-01 3.497056e-01

(Remember that \(\psi_0\equiv 1\) in all cases)

The output for estimating the AR(2) included this estimate of the error variance:

sigma^2 estimated as 11.47

As an example, the standard error of the forecast error for 3 times past the end of the series is

\(\sqrt{\widehat{\sigma}^2_w\sum_{j=0}^{3-1}\psi^2_j} = \sqrt{11.47(1+1.148^2+0.982^2)} = 6.1357\)

which, except for round off error, matches the value of 6.135493 given as the third standard error in the sarima.for output above.

Where will the Forecasts End Up?

For a stationary series and model, the forecasts of future values will eventually converge to the mean and then stay there. Note below what happened with the stride length forecasts, when we asked for 30 forecasts past the end of the series. [Command was sarima.for (stridelength, 30, 2, 0, 0)]. The forecast got to 48.74753 and then stayed there.

\$pred
Time Series:
Start = 91
End = 120
[1] 69.78674 64.75441 60.05661 56.35385 53.68102 51.85633 50.65935 49.89811
[9] 49.42626 49.14026 48.97043 48.87153 48.81503 48.78339 48.76604 48.75676
[17] 48.75192 48.74949 48.74833 48.74780 48.74760 48.74753 48.74753 48.74755
[25] 48.74757 48.74759 48.74760 48.74761 48.74762 48.74762


The graph showing the series and the six prediction intervals is the following

graph