The techniques of the previous section can all be used in the context of forecasting, which is the art of modeling patterns in the data that are usually visible in time series plots and then extrapolated into the future. In this section, we discuss exponential smoothing methods that rely on smoothing parameters, which are parameters that determine how fast the weights of the series decay. For each of the three methods we discuss below, the smoothing constants are found objectively by selecting those values which minimize one of the three error-size criteria below:
\[\begin{align*} \textrm{MSE}&=\frac{1}{n}\sum_{t=1}^{n}e_{t}^{2}\\ \textrm{MAE}&=\frac{1}{n}\sum_{t=1}^{n}|e_{t}|\\ \textrm{MAPE}&=\frac{100}{n}\sum_{t=1}^{n}\biggl|\frac{e_{t}}{Y_{t}}\biggr|, \end{align*}\]
where the error is the difference between the actual value of the time series at time t and the fitted value at time t, which is determined by the smoothing method employed. Moreover, MSE is (as usual) the mean square error, MAE is the mean absolute error, and MAPE is the mean absolute percent error.
Exponential smoothing methods also require initialization since the forecast for period one requires the forecast at period zero, which we do not (by definition) have. Several methods have been proposed for generating starting values. The most commonly used is the backcasting method, which entails reversing the series so that we forecast into the past instead of into the future. This produces the required starting value. Once we have done this, we then switch the series back and apply the exponential smoothing algorithm in a regular manner.
Single exponential smoothing smoothes the data when no trend or seasonal components are present. The equation for this method is:
\[\begin{equation*} \hat{Y}_{t}=\alpha(Y_{t}+\sum_{i=1}^{r}(1-\alpha)^{i}Y_{t-i}), \end{equation*}\]
where \(\hat{Y}_{t}\) is the forecasted value of the series at time t and \(\alpha\) is the smoothing constant. Note that \(r<t\), but r does not have to equal \(t-1\). From the above equation, we see that the method constructs a weighted average of the observations. The weight of each observation decreases exponentially as we move back in time. Hence, since the weights decrease exponentially and averaging is a form of smoothing, the technique was named exponential smoothing. An equivalent ARIMA(0,1,1) model can be constructed to represent the single exponential smoother.
Double exponential smoothing (also called Holt's method) smoothes the data when a trend is present. The double exponential smoothing equations are:
\[\begin{align*} L_{t}&=\alpha Y_{t}+(1-\alpha)(L_{t-1}+T_{t-1})\\ T_{t}&=\beta(L_{t}-L_{t-1})+(1-\beta)T_{t-1}\\ \hat{Y}_{t}&=L_{t-1}+T_{t-1}, \end{align*}\]
where \(L_{t}\) is the level at time t, \(\alpha\) is the weight (or smoothing constant) for the level, \(T_{t}\) is the trend at time t, \(\beta\) is the weight (or smoothing constant) for the trend, and all other quantities are defined as earlier. An equivalent ARIMA(0,2,2) model can be constructed to represent the double exponential smoother.
Finally, Holt-Winters exponential smoothing smoothes the data when trend and seasonality are present; however, these two components can be either additive or multiplicative. For the additive model, the equations are:
\[\begin{align*} L_{t}&=\alpha(Y_{t}-S_{t-p})+(1-\alpha)(L_{t-1}+T_{t-1})\\ T_{t}&=\beta(L_{t}-L_{t-1})+(1-\beta)T_{t-1}\\ S_{t}&=\delta(Y_{t}-L_{t})+(1-\delta)S_{t-p}\\ \hat{Y}_{t}&=L_{t-1}+T_{t-1}+S_{t-p}. \end{align*}\]
For the multiplicative model, the equations are:
\[\begin{align*} L_{t}&=\alpha(Y_{t}/S_{t-p})+(1-\alpha)(L_{t-1}+T_{t-1})\\ T_{t}&=\beta(L_{t}-L_{t-1})+(1-\beta)T_{t-1}\\ S_{t}&=\delta(Y_{t}/L_{t})+(1-\delta)S_{t-p}\\ \hat{Y}_{t}&=(L_{t-1}+T_{t-1})S_{t-p}. \end{align*}\]
For both sets of equations, all quantities are the same as they were defined in the previous models, except now we also have that \(S_{t}\) is the seasonal component at time t, \(\delta\) is the weight (or smoothing constant) for the seasonal component, and p is the seasonal period.