12.3.1 - Formulas

Simple linear regression uses data from a sample to construct the line of best fit. But what makes a line “best fit”? The most common method of constructing a regression line, and the method that we will be using in this course, is the least squares method. The least squares method computes the values of the intercept and slope that make the sum of the squared residuals as small as possible.

Recall from Lesson 3, a residual is the difference between the actual value of y and the predicted value of y (i.e., \(y - \widehat y\)). The predicted value of y ("\(\widehat y\)") is sometimes referred to as the "fitted value" and is computed as \(\widehat{y}_i=b_0+b_1 x_i\). 

Below, we'll look at some of the formulas associated with this simple linear regression method. In this course, you will be responsible for computing predicted values and residuals by hand. You will not be responsible for computing the intercept or slope by hand.

Residuals Section

Residuals are symbolized by \(\varepsilon \) (“epsilon”) in a population and \(e\) or \(\widehat{\varepsilon }\) in a sample.

As with most predictions, you expect there to be some error. For example, if we are using height to predict weight, we wouldn't expect to be able to perfectly predict every individuals weight using their height. There are many variables that impact a person's weight, and height is just one of those many variables. These errors in regression predictions are called prediction error or residuals.

A residual is calculated by taking an individual's observed y value minus their corresponding predicted y value. Therefore, each individual has a residual. The goal in least squares regression is to construct the regression line that minimizes the squared residuals. In essence, we create a best fit line that has the least amount of error.

Residual
\(e_i =y_i -\widehat{y}_i\)

\(y_i\) = actual value of y for the ith observation
\(\widehat{y}_i\) = predicted value of y for the ith observation

Sum of Squared Residuals

Also known as Sum of Squared Errors (SSE)
\(SSE=\sum (y-\widehat{y})^2\)

Computing the Intercept & Slope Section

Note! Recall, the equation for a simple linear regression line is \(\widehat{y}=b_0+b_1x\) where \(b_0\) is the \(y\)-intercept and \(b_1\) is the slope.

Statistical software will compute the values of the \(y\)-intercept and slope that minimize the sum of squared residuals. The conceptual formulas below show how these statistics are related to one another and how they relate to correlation which you learned about earlier in this lesson. In this course we will always be using Minitab to compute these values.

Slope
\(b_1 =r \dfrac{s_y}{s_x}\)

\(r\) = Pearson’s correlation coefficient between \(x\) and \(y\)
\(s_y\) = standard deviation of \(y\)
\(s_x\) = standard deviation of \(x\)

y-intercept
\(b_0=\overline {y}  -  b_1 \overline {x}\)

\(\overline {y}\) = mean of \(y\)
\(\overline {x}\) = mean of \(x\)
\(b_1\) = slope

Review of New Terms Section

Before we continue, let’s review a few key terms:

Least squares method
Method of constructing a regression line which makes the sum of squared residuals as small as possible for the given data.
Predicted Value
Symbolized as \(\widehat y\) ("y-hat") and also known as the "fitted value," the expected value of y for a given value of x
Residual
Symbolized as \(\varepsilon \) (“epsilon”) in a population and \(e\) or \(\widehat{\varepsilon }\) in a sample, an individual's observed y value minus their predicted y value (i.e., \(e=y- \widehat{y}\)); on a scatterplot, this is the vertical distance between the observed y value and the regression line
Sum of squared residuals
Also known as the sum of squared errors ("SSE"), the sum of all of the residuals squared: \(\sum (y-\widehat{y})^2\).