8.2.2 - The SLR Model

The errors referred to in the assumptions are only one component of the linear model. The basis of the model, the observations are considered as coordinates, \((x_i, y_i)\), for \(i=1, …, n\). The points, \(\left(x_1,y_1\right), \dots,\left(x_n,y_n\right)\), may not fall exactly on a line, (like the cost and number of critical areas). This gap is the error!

The graph below is an example of a scatter plot showing height as the explanatory variable for height. Select the + icons to view the explanations of the different parts of the scatterplot and the least-squares regression line.

The graph below summarizes the least-squares regression for Bob's data. We will define what we mean by least squares regression in more detail later in the Lesson, for now, focus on how the red line (the regression line) "fits" the blue dots (Bob's data)

We combine the linear relationship along with the error in the simple linear regression model.

Simple Linear Regression Model Section

The general form of the simple linear regression model is...

\(Y=\beta_0+\beta_1X+\epsilon\)

For an individual observation,

\(y_i=\beta_0+\beta_1x_i+\epsilon_i\)

where,

\(\beta_0\) is the population y-intercept,
\(\beta_1\) is the population slope, and
\(\epsilon_i\) is the error or deviation of \(y_i\) from the line, \(\beta_0+\beta_1x_i\)

To make inferences about these unknown population parameters (namely the slope and intercept), we must find an estimate for them. There are different ways to estimate the parameters from the sample. This is where we get to n the least-squares method.

Least Squares Line Section

The least-squares line is the line for which the sum of squared errors of predictions for all sample points is the least.

Using the least-squares method, we can find estimates for the two parameters.

The formulas to calculate least squares estimates are:

Sample Slope: \(\hat{\beta}_1=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2}\)

Sample Intercept: \(\hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}\)

The least squares line for Bob’s data is the red line on the scatterplot below.

Note! You will not be expected to memorize these formulas or to find the estimates by hand. We will use Minitab to find these estimates for you. We estimate the population slope, \(\beta_1\), with the sample slope denoted \(\hat{\beta_1}\). The population intercept, \(\beta_0\), is estimated with the sample intercept denoted \(\hat{\beta_0}\). The intercept is often referred to as the constant or the constant term. Once the parameters are estimated, we have the least square regression equation line (or the estimated regression line).

Let’s jump ahead for a moment and generate the regression output. Below we will work through the content of the output. The regression output for Bob’s data look like this:

Coefficients

Predictor	Coef	SE Coef	T-Value	P-Value	VIF
Constant	49.542	0.560	88.40	0.000
Critical Areas	10.417	0.115	90.92	0.000	1.00

Regression Equation

Cost = 49.542 + 10.417 Critical Areas