The errors referred to in the assumptions are only one component of the linear model. The basis of the model, the observations are considered as coordinates, \((x_i, y_i)\), for \(i=1, …, n\). The points, \(\left(x_1,y_1\right), \dots,\left(x_n,y_n\right)\), may not fall exactly on a line, (like the cost and number of critical areas). This gap is the error!
The graph below is an example of a scatter plot showing height as the explanatory variable for height. Select the + icons to view the explanations of the different parts of the scatterplot and the least-squares regression line.
The graph below summarizes the least-squares regression for Bob's data. We will define what we mean by least squares regression in more detail later in the Lesson, for now, focus on how the red line (the regression line) "fits" the blue dots (Bob's data)
We combine the linear relationship along with the error in the simple linear regression model.
Simple Linear Regression Model Section
-
The general form of the simple linear regression model is...
\(Y=\beta_0+\beta_1X+\epsilon\)
For an individual observation,
\(y_i=\beta_0+\beta_1x_i+\epsilon_i\)
where,
- \(\beta_0\) is the population y-intercept,
- \(\beta_1\) is the population slope, and
- \(\epsilon_i\) is the error or deviation of \(y_i\) from the line, \(\beta_0+\beta_1x_i\)
To make inferences about these unknown population parameters (namely the slope and intercept), we must find an estimate for them. There are different ways to estimate the parameters from the sample. This is where we get to n the least-squares method.
Least Squares Line Section
The least-squares line is the line for which the sum of squared errors of predictions for all sample points is the least.
Using the least-squares method, we can find estimates for the two parameters.
The formulas to calculate least squares estimates are:
- Sample Slope
- \(\hat{\beta}_1=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2}\)
- Sample Intercept
- \(\hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}\)
The least squares line for Bob’s data is the red line on the scatterplot below.
Let’s jump ahead for a moment and generate the regression output. Below we will work through the content of the output. The regression output for Bob’s data look like this:
Coefficients
Predictor | Coef | SE Coef | T-Value | P-Value | VIF |
---|---|---|---|---|---|
Constant | 49.542 | 0.560 | 88.40 | 0.000 | |
Critical Areas | 10.417 | 0.115 | 90.92 | 0.000 | 1.00 |
Regression Equation
Cost = 49.542 + 10.417 Critical Areas