Lesson 1: Time Series Basics
Lesson 1: Time Series BasicsOverview
This first lesson will introduce you to time series data and important characteristics of time series data. We will also begin some basic modeling. Topics covered include firstorder autoregressive models and the autocorrelation function.
Objectives
 Identify important features on a time series plot
 Identify and interpret an AR(1) model
 Interpret an ACF
 Identify a weakly stationary time series
 Identify when and how to take first differences
1.1 Overview of Time Series Characteristics
1.1 Overview of Time Series CharacteristicsIn this lesson, we’ll describe some important features that we must consider when describing and modeling a time series. This is meant to be an introductory overview, illustrated by example, and not a complete look at how we model a univariate time series. Here, we’ll only consider univariate time series. We’ll examine relationships between two or more time series later on.
 Univariate Time Series
 A univariate time series is a sequence of measurements of the same variable collected over time. Most often, the measurements are made at regular time intervals.
One difference from standard linear regression is that the data are not necessarily independent and not necessarily identically distributed. One defining characteristic of a time series is that it is a list of observations where the ordering matters. Ordering is very important because there is dependency and changing the order could change the meaning of the data.
Basic Objectives of the Analysis
The basic objective usually is to determine a model that describes the pattern of the time series. Uses for such a model are:
 To describe the important features of the time series pattern.
 To explain how the past affects the future or how two time series can “interact”.
 To forecast future values of the series.
 To possibly serve as a control standard for a variable that measures the quality of product in some manufacturing situations.
Types of Models
There are two basic types of “time domain” models.
 Models that relate the present value of a series to past values and past prediction errors  these are called ARIMA models (for Autoregressive Integrated Moving Average). We’ll spend substantial time on these.
 Ordinary regression models that use time indices as xvariables. These can be helpful for an initial description of the data and form the basis of several simple forecasting methods.
Important Characteristics to Consider First
Some important questions to first consider when first looking at a time series are:
 Is there a trend, meaning that, on average, the measurements tend to increase (or decrease) over time?
 Is there seasonality, meaning that there is a regularly repeating pattern of highs and lows related to calendar time such as seasons, quarters, months, days of the week, and so on?
 Are there outliers? In regression, outliers are far away from your line. With time series data, your outliers are far away from your other data.
 Is there a longrun cycle or period unrelated to seasonality factors?
 Is there constant variance over time, or is the variance nonconstant?
 Are there any abrupt changes to either the level of the series or the variance?
Example 11
The following plot is a time series plot of the annual number of earthquakes in the world with seismic magnitude over 7.0, for 99 consecutive years. By a time series plot, we simply mean that the variable is plotted against time.
Some features of the plot:
 There is no consistent trend (upward or downward) over the entire time span. The series appears to slowly wander up and down. The horizontal line drawn at quakes = 20.2 indicates the mean of the series. Notice that the series tends to stay on the same side of the mean (above or below) for a while and then wanders to the other side.
 Almost by definition, there is no seasonality as the data are annual data.
 There are no obvious outliers.
 It’s difficult to judge whether the variance is constant or not.
One of the simplest ARIMA type models is a model in which we use a linear model to predict the value at the present time using the value at the previous time. This is called an AR(1) model, standing for autoregressive model of order 1. The order of the model indicates how many previous times we use to predict the present time.
A start in evaluating whether an AR(1) might work is to plot values of the series against lag 1 values of the series. Let \(x_t\) denote the value of the series at any particular time \(t\), so \(x_{t1}\) denotes the value of the series one time before time \(t\). That is, \(x_{t1}\) is the lag 1 value of \(x_t\). As a short example, here are the first five values in the earthquake series along with their lag 1 values:
t  \(x_t\)  \(x_{t1}\) (lag 1 value) 

1  13  * 
2  14  13 
3  8  14 
4  10  8 
5  16  10 
For the complete earthquake dataset, here’s a plot of \(x_t\) versus \(x_{t1}\) :
Although it’s only a moderately strong relationship, there is a positive linear association so an AR(1) model might be a useful model.
The AR(1) model
Theoretically, the AR(1) model is written
\(x_t = \delta + \phi_1 x_{t1} + w_t\)
Assumptions:
 \(w_t \overset{iid}{\sim} N(0, \sigma^2_w)\), meaning that the errors are independently distributed with a normal distribution that has mean 0 and constant variance.
 Properties of the errors \(w_t\) are independent of \(x\).
This is essentially the ordinary simple linear regression equation, but there is one difference. Although it’s not usually true, in ordinary least squares regression we assume that the xvariable is not random but instead is something we can control. That’s not the case here, but in our first encounter with time series we’ll overlook that and use ordinary regression methods. We’ll do things the “right” way later in the course.
Following is Minitab output for the AR(1) regression in this example:
quakes = 9.19 + 0.543 lag1
98 cases used, 1 cases contain missing values
Predictor  Coef  SE Coef  T  P 

Constant  9.191  1.819  5.05  0.000 
lag1  0.54339  0.08528  6.37  0.000 
S = 6.12239 RSq = 29.7% RSq(adj) = 29.0%
We see that the slope coefficient is significantly different from 0, so the lag 1 variable is a helpful predictor. The \(R^2\) value is relatively weak at 29.7%, though, so the model won’t give us great predictions.
Residual Analysis
In traditional regression, a plot of residuals versus fits is a useful diagnostic tool. The ideal for this plot is a horizontal band of points. Following is a plot of residuals versus predicted values for our estimated model. It doesn’t show any serious problems. There might be one possible outlier at a fitted value of about 28.
Example 12
The following plot shows a time series of quarterly production of beer in Australia for 18 years.
Some important features are:
 There is an upward trend, possibly a curved one.
 There is seasonality – a regularly repeating pattern of highs and lows related to quarters of the year.
 There are no obvious outliers.
 There might be increasing variation as we move across time, although that’s uncertain.
There are ARIMA methods for dealing with series that exhibit both trend and seasonality, but for this example, we’ll use ordinary regression methods.
Classical regression methods for trend and seasonal effects
To use traditional regression methods, we might model the pattern in the beer production data as a combination of the trend over time and quarterly effect variables.
Suppose that the observed series is \(x_t\), for \(t = 1,2, \dots, n\).
 For a linear trend, use \(t\) (the time index) as a predictor variable in a regression.
 For a quadratic trend, we might consider using both \(t\) and \(t^2\).
 For quarterly data, with possible seasonal (quarterly) effects, we can define indicator variables such as \(S_j=1\) if the observation is in quarter \(j\) of a year and 0 otherwise. There are 4 such indicators.
Let \(\epsilon_t \overset{iid}{\sim} N(0, \sigma^2)\). A model with additive components for linear trend and seasonal (quarterly) effects might be written
\(x_t = \beta_1t+\alpha_1S_1+\alpha_2S_2 + \alpha_3S_3 +\alpha_4S_4 + \epsilon_t\)
To add a quadratic trend, which may be the case in our example, the model is
\(x_t = \beta_1t + \beta_2t^2 +\alpha_1S_1 + \alpha_2S_2 + \alpha_3S_3 +\alpha_4S_4 + \epsilon_t\)
We’ve deleted the “intercept” from the model. This isn’t necessary, but if we include it we’ll have to drop one of the seasonal effect variables from the model to avoid collinearity issues.
Back to Example 2: Following is the Minitab output for a model with a quadratic trend and seasonal effects. All factors are statistically significant.
Predictor  Coef  SE Coef  T  P 

Noconstant  
Time  0.5881  0.2193  2.68  0.009 
tsqrd  0.031214  0.002911  10.72  0.000 
quarter_1  261.930  3.937  66.52  0.000 
quarter_2  212.165  3.968  53.48  0.000 
quarter_3  228.415  3.994  57.18  0.000 
quarter_4  310.880  4.018  77.37  0.000 
Residual Analysis
For this example, the plot of residuals versus fits doesn’t look too bad, although we might be concerned by the string of positive residuals at the far right.
When data are gathered over time, we typically are concerned with whether a value at the present time can be predicted from values at past times. We saw this in the earthquake data of example 1 when we used an AR(1) structure to model the data. For residuals, however, the desirable result is that the correlation is 0 between residuals separated by any given time span. In other words, residuals should be unrelated to each other.
Sample Autocorrelation Function (ACF)
The sample autocorrelation function (ACF) for a series gives correlations between the series \(x_t\) and lagged values of the series for lags of 1, 2, 3, and so on. The lagged values can be written as \(x_{t1}, x_{t2}, x_{t3}\), and so on. The ACF gives correlations between \(x_t\) and \(x_{t1}\), \(x_t\) and \(x_{t2}\), and so on.
The ACF can be used to identify the possible structure of time series data. That can be tricky going as there often isn’t a single clearcut interpretation of a sample autocorrelation function. We’ll get started on that in Lesson 1.2 this week. The ACF of the residuals for a model is also useful. The ideal for a sample ACF of residuals is that there aren’t any significant correlations for any lag.
Following is the ACF of the residuals for Example 1, the earthquake example, where we used an AR(1) model. The "lag" (time span between observations) is shown along the horizontal, and the autocorrelation is on the vertical. The red lines indicated bounds for statistical significance. This is a good ACF for residuals. Nothing is significant; that’s what we want for residuals.
The ACF of the residuals for the quadratic trend plus seasonality model we used for Example 2 looks good too. Again, there appears to be no significant autocorrelation in the residuals. The ACF of the residual follows:
Lesson 1.2 will give more details about the ACF. Lesson 1.3 will give some R code for examples in Lesson 1.1 and Lesson 1.2.
1.2 Sample ACF and Properties of AR(1) Model
1.2 Sample ACF and Properties of AR(1) ModelThis lesson defines the sample autocorrelation function (ACF) in general and derives the pattern of the ACF for an AR(1) model. Recall from Lesson 1.1 for this week that an AR(1) model is a linear model that predicts the present value of a time series using the immediately prior value in time.
Stationary Series
As a preliminary, we define an important concept, that of a stationary series. For an ACF to make sense, the series must be a weakly stationary series. This means that the autocorrelation for any particular lag is the same regardless of where we are in time.
 (Weakly) Stationary Series

A series \(x_t\) is said to be (weakly) stationary if it satisfies the following properties:
 The mean \(E(x_t)\) is the same for all \(t\).
 The variance of \(x_t\) is the same for all \(t\).
 The covariance (and also correlation) between \(x_t\) and \(x_{th}\) is the same for all \(t\) at each lag \(h\) = 1, 2, 3, etc.
 Autocorrelation Function (ACF)

Let \(x_t\) denote the value of a time series at time \(t\). The ACF of the series gives correlations between \(x_t\) and \(x_{th}\) for \(h\) = 1, 2, 3, etc. Theoretically, the autocorrelation between \(x_t\) and \(x_{th}\)_{}equals
\(\dfrac{\text{Covariance}(x_t, x_{th})}{\text{Std.Dev.}(x_t)\text{Std.Dev.}(x_{th})} = \dfrac{\text{Covariance}(x_t, x_{th})}{\text{Variance}(x_t)}\)
The denominator in the second formula occurs because the standard deviation of a stationary series is the same at all times.
The last property of a weakly stationary series says that the theoretical value of autocorrelation of particular lag is the same across the whole series. An interesting property of a stationary series is that theoretically it has the same structure forwards as it does backward.
Many stationary series have recognizable ACF patterns. Most series that we encounter in practice, however, is not stationary. A continual upward trend, for example, is a violation of the requirement that the mean is the same for all \(t\). Distinct seasonal patterns also violate that requirement. The strategies for dealing with nonstationary series will unfold during the first three weeks of the semester.
The Firstorder Autoregression Model
We’ll now look at theoretical properties of the AR(1) model. Recall from Lesson 1.1, that the 1^{st }order autoregression model is denoted as AR(1). In this model, the value of \(x\) at time \(t\) is a linear function of the value of \(x\) at time \(t1\). The algebraic expression of the model is as follows:
\(x_t = \delta + \phi_1x_{t1}+w_t\)
Assumptions
 \(w_t \overset{iid}{\sim} N(0, \sigma^2_w)\), meaning that the errors are independently distributed with a normal distribution that has mean 0 and constant variance.
 Properties of the errors \(w_t\) are independent of \(x_t\).
 The series \(x_1\), \(x_2\), ... is (weakly) stationary. A requirement for a stationary AR(1) is that \(\phi_1 < 1\). We’ll see why below.
Properties of the AR(1)
Formulas for the mean, variance, and ACF for a time series process with an AR(1) model follow.
 The (theoretical) mean of \(x_t\) is
\(E(x_t)=\mu = \dfrac{\delta}{1\phi_1}\)
 The variance of \(x_t\) is
\(\text{Var}(x_t) = \dfrac{\sigma^2_w}{1\phi_1^2}\)
 The correlation between observations \(h\) time periods apart is
\(\rho_h = \phi^h_1\)
This defines the theoretical ACF for a time series variable with an AR(1) model.
\(\phi_1\) is the slope in the AR(1) model and we now see that it is also the lag 1 autocorrelation.
Details of the derivations of these properties are in the Appendix to this lesson for interested students.
Pattern of ACF for AR(1) Model
The ACF property defines a distinct pattern for the autocorrelations. For a positive value of \(\phi_1\), the ACF exponentially decreases to 0 as the lag \(h\) increases. For negative \(\phi_1\), the ACF also exponentially decays to 0 as the lag increases, but the algebraic signs for the autocorrelations alternate between positive and negative.
Following is the ACF of an AR(1) with \(\phi_1\)= 0.6, for the first 12 lags.
The tapering pattern:
The ACF of an AR(1) with \(\phi_1\) = −0.7 follows.
The alternating and tapering pattern.
Example 13
In Example 1 of Lesson 1.1, we used an AR(1) model for annual earthquakes in the world with seismic magnitude greater than 7. Here’s the sample ACF of the series:
Lag.  ACF 

1.  0.541733 
2.  0.418884 
3.  0.397955 
4.  0.324047 
5.  0.237164 
6.  0.171794 
7.  0.190228 
8.  0.061202 
9.  0.048505 
10.  0.106730 
11.  0.043271 
12.  0.072305 
The sample autocorrelations taper, although not as fast as they should for an AR(1). For instance, theoretically the lag 2 autocorrelation for an AR(1) = squared value of lag 1 autocorrelation. Here, the observed lag 2 autocorrelation = .418884. That’s somewhat greater than the squared value of the first lag autocorrelation (.541733^{2}= 0.293). But, we managed to do okay (in Lesson 1.1) with an AR(1) model for the data. For instance, the residuals looked okay. This brings up an important point – the sample ACF will rarely fit a perfect theoretical pattern. A lot of the time you just have to try a few models to see what fits.
We’ll study the ACF patterns of other ARIMA models during the next three weeks. Each model has a different pattern for its ACF, but in practice the interpretation of a sample ACF is not always so clearcut.
A reminder: Residuals usually are theoretically assumed to have an ACF that has correlation = 0 for all lags.
Example 14
Here’s a time series of the daily cardiovascular mortality rate in Los Angeles County, 19701979
There is a slight downward trend, so the series may not be stationary. To create a (possibly) stationary series, we’ll examine the first differences \(y_t=x_tx_{t1}\). This is a common time series method for creating a detrended series and thus potentially a stationary series. Think about a straight line – there are constant differences in average \(y\) for each change of 1unit in \(x\).
The time series plot of the first differences is the following:
The following plot is the sample estimate of the autocorrelation function of 1^{st} differences:
Lag.  ACF 

1.  0.506029 
2.  0.205100 
3.  0.126110 
4.  0.062476 
5.  0.015190 
This looks like the pattern of an AR(1) with a negative lag 1 autocorrelation.
The lag 2 correlation is roughly equal to the squared value of the lag 1 correlation. The lag 3 correlation is nearly exactly equal to the cubed value of the lag 1 correlation, and the lag 4 correlation nearly equals the fourth power of the lag 1 correlation. Thus an AR(1) model may be a suitable model for the first differences \(y_t = x_t  x_{t1}\) .
Let \(y_t\) denote the first differences, so that \(y_t = x_t  x_{t1}\) and \(y_{t1} = x_{t1}x_{t2}\). We can write this AR(1) model as
\(y_t = \delta + \phi_1y_{t1}+w_t\)
Using R, we found that the estimated model for the first differences is
\(\widehat{y}_t = 0.046270.50636y_{t1}\)
Some R code for this example will be given in Lesson 1.3 for this week.
Appendix Derivations of Properties of AR(1)
Generally you won’t be responsible for reproducing theoretical derivations, but interested students may want to see the derivations for the theoretical properties of an AR(1).
The algebraic expression of the model is as follows:
\(x_t = \delta + \phi_1x_{t1}+w_t\)
Assumptions
 \(w_t \overset{iid}{\sim} N(0, \sigma^2_w)\), meaning that the errors are independently distributed with a normal distribution that has mean 0 and constant variance.
 Properties of the errors \(w_t\) are independent of \(x_t\).
 The series \(x_1\), \(x_2\), ... is (weakly) stationary. A requirement for a stationary AR(1) is that \(\phi_1<1\). We’ll see why below.
Mean
\(E(x_t) = E(\delta + \phi_1x_{t1}+w_t) = E(\delta) + E(\phi_1x_{t1}) + E(w_t) = \delta + \phi_1E(x_{t1}) + 0\)
With the stationary assumption, \(E(x_t) = E(x_{t1})\). Let \(\mu\) denote this common mean. Thus \(\mu = \delta + \phi_1\mu\). Solve for \(\mu\) to get
\(\mu = \dfrac{\delta}{1\phi_1}\)
Variance
By independence of errors and values of \(x\),
\begin{eqnarray}
\text{Var}(x_t) &=& \text{Var}(\delta)+\text{Var}(\phi_1 x_{t1})+\text{Var}(w_t) \nonumber \\
&=& \phi_1^2 \text{Var}(x_{t1})+\sigma^2_w
\end{eqnarray}
By the stationary assumption, \(\text{Var}(x_t) = \text{Var}(x_{t1})\). Substitute \(\text{Var}(x_t)\) for \(\text{Var}(x_{t1})\) and then solve for \(\text{Var}(x_t)\). Because \(\text{Var}(x_t)>0\), it follows that \((1\phi^2_1)>0\) and therefore \(\phi_1<1\).
Autocorrelation Function (ACF)
To start, assume the data have mean 0, which happens when \(\delta=0\), and \(x_t=\phi_1x_{t1}+w_t\). In practice this isn’t necessary, but it simplifies matters. Values of variances, covariances and correlations are not affected by the specific value of the mean.
Let \(y_h = E( x_t x_{t + h }) = E ( x_t x_{t h})\), the covariance observations \(h\) time periods apart (when the mean = 0). Let \(\rho_h\) = correlation between observations that are \(h\) time periods apart.
Covariance and correlation between observations one time period apart
\(\gamma_1 = \text{E}(x_t x_{t+1}) = \text{E}(x_t(\phi_1 x_t + w_{t+1})) = \text{E}(\phi_1 x_t^2 + x_t w_{t+1}) = \phi_1 \text{Var}(x_t)\)
\(\rho_1 = \dfrac{\text{Cov}(x_t, x_{t+1})}{\text{Var}(x_t)} = \dfrac{\phi_1 \text{Var}(x_t)}{\text{Var}(x_t)} = \phi_1\)
Covariance and correlation between observations \(h\) time periods apart
To find the covariance \(\gamma_h\)_{ }, multiply each side of the model for \(x_t\) _{ } by \(x_{th}\)_{ }, then take expectations.
\(x_t = \phi_1x_{t1}+w_t\)
\(x_{th}x_t = \phi_1x_{th}x_{t1}+x_{th}w_t\)
\(E(x_{th}x_t) = E(\phi_1x_{th}x_{t1})+E(x_{th}w_t)\)
\(\gamma_h = \phi_1 \gamma_{h1}\)
If we start at \(\gamma_1\), and move recursively forward we get \(\gamma_h = \phi^h_1 \gamma_0\). By definition, \(\gamma_0 = \text{Var}(x_t)\), so this is \(\gamma_h = \phi^h_1\text{Var}(x_t)\). The correlation
\( \rho_h = \dfrac{\gamma_h}{\text{Var}(x_t)} = \dfrac{\phi_1^h \text{Var}(x_t)}{\text{Var}(x_t)} = \phi_1^h \)
1.3 R Code for Two Examples in Lessons 1.1 and 1.2
1.3 R Code for Two Examples in Lessons 1.1 and 1.2R
One example in Lesson 1.1 and Lesson 1.2 concerned the annual number of earthquakes worldwide with a magnitude greater than 7.0 on the seismic scale. We identified an AR(1) model (autoregressive model of order 1), estimated the model, and assessed the residuals. Below is R code that will accomplish these tasks. The first line reads the data from a file named quakes.dat. The data are listed in time order from left to right in the lines of the file. If you were to download the file, you should download it into a folder that you create for storing course data. Then in R, change the working directory to be this folder.
The commands below include explanatory comments, following the #. Those comments do not have to be entered for the command to work.
x=scan("quakes.dat")
x=ts(x) #this makes sure R knows that x is a time series
plot(x, type="b") #time series plot of x with points marked as "o"
install.packages("astsa")
library(astsa) # See note 1 below
lag1.plot(x,1) # Plots x versus lag 1 of x.
acf(x, xlim=c(1,19)) # Plots the ACF of x for lags 1 to 19
xlag1=lag(x,1) # Creates a lag 1 of x variable. See note 2
y=cbind(x,xlag1) # See note 3 below
ar1fit=lm(y[,1]~y[,2])#Does regression, stores results object named ar1fit
summary(ar1fit) # This lists the regression results
plot(ar1fit$fit,ar1fit$residuals) #plot of residuals versus fits
acf(ar1fit$residuals, xlim=c(1,18)) # ACF of the residuals for lags 1 to 18
Note!
 The astsa library accesses R script(s) written by one of the authors of our textbook (Stoffer). In our program, the lag1.plot command is part of that script. You may read more about the library on the website for our text: ASTSA R package. You must install the astsa package in R before loading the commands in the library statement. Not all available packages are included when you install R on your machine (Cran R Project Web Packages). You only need to run install.packages("astsa") once. In subsequent sessions, the library command alone will bring the commands into your current session.
 Note the negative value for the lag in xlag1=lag(x,−1) To lag back in time in R, use a negative lag.
 This is a bit tricky. For whatever reason, R has to bind together a variable with its lags for the lags to be in the proper connection with the original variable. The cbind and the ts.intersect commands both accomplish this task. In the code above, the lagged variable and the original variable become the first and second columns of a matrix named y. The regression command (lm) uses these two columns of y as the response and predictor variables in the regression.
General Note!
If a command that includes quotation marks doesn’t work when you copy and paste from course notes to R, try typing the command in R instead.
The results, as given by R, follow.
The time series plot for the quakes series.
Plot of x versus lag 1 of x.
The ACF of the quakes series.
The regression results:
Estimate  Std. Error  t value  Pr(>t)  

(Intercept)  9.19070  1.81924  5.052  2.08e06 *** 
y[, 2]  0.54339  0.08528  6.372  6.47e09 *** 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.122 on 96 degrees of freedom
(2 observations deleted due to missingness)
Multiple Rsquared: 0.2972, Adjusted Rsquared: 0.2899
Plot of residuals versus fits
ACF of residuals
R
An example in Lesson 1.2 for this week concerned the weekly cardiovascular mortality rate in Los Angeles County. We used a first difference to account for a linear trend and determine that the first differences may have an AR(1) model.
Download the data: cmort.dat
Following are R commands for the analysis. Again, the commands are commented using #comment.
mort=scan("cmort.dat")
plot(mort, type="o") # plot of mortality rate
mort=ts(mort)
mortdiff=diff(mort,1) # creates a variable = x(t) – x(t1)
plot(mortdiff,type="o") # plot of first differences
acf(mortdiff,xlim=c(1,24)) # plot of first differences, for 24 lags
mortdifflag1=lag(mortdiff,1)
y=cbind(mortdiff,mortdifflag1) # bind first differences and lagged first differences
mortdiffar1=lm(y[,1]~y[,2]) # AR(1) regression for first differences
summary(mortdiffar1) # regression results
acf(mortdiffar1$residuals, xlim = c(1,24)) # ACF of residuals for 24 lags.
We’ll leave it to you to try the code and see the output, if you wish.