Before we set up the model, we should clearly define our notation.
The variable \(X\) is the predictor variable and \(x_1, x_2, ...x_n\) are observed values of the predictor, \(X\).
The observations are considered as coordinates, \((x_i, y_i)\), for \(i=1, …, n\). As we saw before, the points, \((x_1,y_1), …,(x_n,y_n)\), may not fall exactly on a line, (like the weight and height example). There is some error we must consider.
We combine the linear relationship along with the error in the simple linear regression model.
- Simple Linear Regression Model
-
The general form of the simple linear regression model is...
\(Y=\beta_0+\beta_1X+\epsilon\)
For an individual observation,
\(y_i=\beta_0+\beta_1x_i+\epsilon_i\)
where,
\(\beta_0\) is the population y-intercept,
\(\beta_1\) is the population slope, and
\(\epsilon_i\) is the error or deviation of \(y_i\) from the line, \(\beta_0+\beta_1x_i\).
To make inferences about these unknown population parameters, we must find an estimate for them. There are different ways to estimate the parameters from the sample. In this class, we will present the least squares method.
- Least Squares Line
- The least squares line is the line for which the sum of squared errors of predictions for all sample points is the least.
Using the least squares method, we can find estimates for the two parameters.
The formulas to calculate least squares estimates are:
Sample Slope
\(\hat{\beta}_1=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2}\)
Sample Intercept
\(\hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}\)
Note! You will not be expected to memorize these formulas or to find the estimates by hand. We will use Minitab to find these estimates for you.
We estimate the population slope, \(\beta_1\), with the sample slope denoted \(\hat{\beta_1}\). The population intercept, \(\beta_0\), is estimated with the sample intercept denoted \(\hat{\beta_0}\). The intercept is often referred to as the constant or the constant term.
Once the parameters are estimated, we have the least square regression equation line (or the estimated regression line).
- Least Squares Regression Equation
- \(\hat{y}=\hat{\beta}_0+\hat{\beta}_1x\)
We can also use the least squares regression line to estimate the errors, called residuals.
- Residual
- \(\hat{\epsilon}_i=y_i-\hat{y}_i\) is the observed error, typically called the residual.
The graph below summarizes the least squares regression for the height and weight data. Select
the icons to view the explanations of the different parts of the scatterplot and the least squares regression line. We will go through this example in more detail later in the Lesson.