To define a useful model, we must investigate the relationship between the response and the predictor variables. As mentioned before, the focus of this Lesson is linear relationships.

For a brief review of linear functions, recall that the equation of a line has the following form:

\(y=mx+b\)

where m is the slope and b is the y-intercept.

Given two points on a line, \((x_1, y_1)\) and \((x_2, y_2)\), the slope is calculated by:

\begin{align} m&=\dfrac{y_2-y_1}{x_2-x_1}\\&=\dfrac{\text{change in y}}{\text{change in x}}\\&=\frac{\text{rise}}{\text{run}} \end{align}

The slope of a line describes a lot about the linear relationship between two variables. If the slope is positive, then there is a positive linear relationship, i.e., as one increases, the other increases. If the slope is negative, then there is a negative linear relationship, i.e., as one increases the other variable decreases. If the slope is 0, then as one increases, the other remains constant.

When we look for linear relationships between two variables, it is rarely the case where the coordinates fall exactly on a straight line; there will be some error. In the next sections, we will show how to examine the data for a linear relationship (i.e., the scatterplot) and how to find a measure to describe the linear relationship (i.e., correlation).