Since many of the time series models have a regression structure, it is beneficial to introduce a general class of time series models called **autoregressive integrated moving average** or **ARIMA models**. They are also referred to as **Box-Jenkins models**, due to the systematic methodology of identifying, fitting, checking, and utilizing ARIMA models, which was popularized by George Box and Gwilym Jenkins in 1976. Before we write down a general ARIMA model, we need to introduce a few additional concepts.

Suppose we have the time series \(Y_{1},Y_{2},\ldots,Y_{t}\). If the value for each *Y* is determined *exactly* by a mathematical formula, then the series is said to be **deterministic**. If the future values of *Y* can only be described through their probability distribution, then the series is said to be a **stochastic process**.^{1 } A special class of stochastic processes is a **stationary stochastic process**, which occurs when the probability distribution for the process is the same for all starting values of *t*. Such a process is completely defined by its mean, variance, and autocorrelation function. When a time series exhibits nonstationary behavior, then part of our objective will be to transform it into a stationary process.

When stationarity is not an issue, then we can define an **autoregressive moving average** or **ARMA** model as follows:

\(\begin{equation*} Y_{t}=\sum_{i=1}^{p}\phi_{i}Y_{t-i}+a_{t}-\sum_{j=1}^{q}\theta_{j}a_{t-j}, \end{equation*} \)

where \(\phi_{1},\ldots,\phi_{p}\) are the autoregressive parameters to be estimated, \(\theta_{1},\ldots,\theta_{q}\) are the moving average parameters to be estimated, and \(a_{1},\ldots,a_{t}\) are a series of unknown random errors (or residuals) that are assumed to follow a normal distribution. This is also referred to as an ARMA(*p*,*q*) model. The model can be simplified by introducing the **Box-Jenkins backshift operator**, which is defined by the following relationship:

\(\begin{equation*} B^{p}X_{t}=X_{t-p}, \end{equation*}\)

such that \(X_{1},\ldots,X_{t}\) is any time series and \(p<t\). Using the backshift notation yields the following:

\(\begin{equation*} \biggl(1-\sum_{i=1}^{p}\phi_{i}B^{i}\biggr)Y_{t}=\biggl(1-\sum_{j=1}^{q}\theta_{j}B^{j}\biggr)a_{t}, \end{equation*}\)

which is often reduced further to

\(\begin{equation*} \phi_{p}(B)Y_{t}=\theta_{q}(B)a_{t}, \end{equation*}\)

where \(\phi_{p}(B)=(1-\sum_{i=1}^{p}\phi_{i}B^{i})\) and \(\theta_{q}(B)=(1-\sum_{j=1}^{q}\theta_{j}B^{j})\).

When time series exhibit nonstationary behavior (which commonly occurs in practice), then the ARMA model presented above can be extended and written using differences, which are defined as follows:

\(\begin{align*} W_{t}&=Y_{t}-Y_{t-1}=(1-B)Y_{t}\\ W_{t}-W_{t-1}&=Y_{t}-2Y_{t-1}+Y_{t-2}\\ &=(1-B)^{2}Y_{t}\\ & \ \ \vdots\\ W_{t}-\sum_{k=1}^{d}W_{t-k}&=(1-B)^{d}Y_{t}, \end{align*}\)

where *d* is the order of **differencing**. Replacing \(Y_{t}\) in the ARMA model with the differences defined above yields the formal ARIMA(*p*,*d*,*q*) model:

\(\begin{equation*} \phi_{p}(B)(1-B)^{d}Y_{t}=\theta_{q}(B)a_{t}. \end{equation*}\)

An alternative way to deal with nonstationary behavior is to simply fit a linear trend to the time series and then fit a Box-Jenkins model to the residuals from the linear fit. This will provide a different fit and a different interpretation, but is still a valid way to approach a process exhibiting nonstationarity.

Seasonal time series can also be incorporated into a Box-Jenkins framework. The following general model is usually recommended:

\(\begin{equation*} \phi_{p}(B)\Phi(P)(B^{s})(1-B)^{d}(1-B^{s})^{D}Y_{t}=\theta_{q}(B)\Theta_{Q}(B^{s})a_{t}, \end{equation*}\)

where *p*, *d*, and *q* are as defined earlier, *s* is a (known) number of seasons per timeframe (e.g., years), *D* is the order of the seasonal differencing, and *P* and *Q* are the autoregressive and moving average orders, respectively, when accounting for the seasonal shift. Moreover, the operator polynomials \(\phi_{p}(B)\) and \(\theta_{q}(B)\) are as defined earlier, while \(\Phi_{P}(B)=(1-\sum_{i=1}^{P}\Phi_{i}B^{s\times i})\) and \(\Theta_{Q}(B)=(1-\sum_{j=1}^{Q}\Theta_{j}B^{s\times j})\). Luckily, the maximum value of *p*, *d*, *q*, *P*, *D*, and *Q* is usually 2, so the resulting expression is relatively simple.

With all of the modeling discussion provided above, we will now provide a brief overview of how to implement the Box-Jenkins methodology in practice, which (fortunately) most statistical software packages will perform for you:

**Model Identification**: Using plots of the data, autocorrelations, partial autocorrelations, and other information, a class of simple ARIMA models is selected. This amounts to estimating (or guesstimating) an appropriate value for*d*followed by estimates for*p*and*q*(as well as*P*,*D*, and*Q*in the seasonal time series setting).**Model Estimation**: The autoregressive and moving average parameters are found via an optimization method like maximum likelihood.**Diagnostic Checking**: The fitted model is checked for inadequacies by studying the autocorrelations of the residual series (i.e., the time-ordered residuals).

The above is then iterated until there appears to be minimal to no improvement in the fitted model.

1 It should be noted that stochastic processes is itself a heavily-studied and very important statistical subject.