Lesson 9: Linear Regression Foundations

Overview Section

In this Lesson, we will first introduce the Simple Linear Regression (SLR) Model and the Correlation Coefficient. Inferences for the simple linear regression model will be discussed, and the critical distinction between inference for mean response and inference for the outcome will be clarified. We will also introduce a basic understanding of the multiple regression model.

Regression analysis is a tool to investigate how two or more variables are related. Quite often we want to see how a specific variable of interest is affected by one or more variables. For example, one may wish to use a person's height, gender, race, etc. to predict a person's weight. Let us first consider the simplest case: using a person's height to predict the person's weight.

Example: Estimating Weights

If you are asked to estimate the weight of a STAT 500 student, what will you use as a point estimate? If I tell you that the height of the student is 70 inches, can you give a better estimate of the person's weight?

Answer

For the first part, the point estimate would be the average weight (or the median weight) of a STAT 500 student. If you know the student is 70 inches, then, yes, you can give a better estimate of the person’s weight, but only if you have some idea about how height and weight are related.

It is important to distinguish between the variable of interest and the variable(s) we will use to predict the variable of interest.

 
Response Variable
Denoted, Y, is also called the variable of interest or dependent variable. In the example, weight is the response variable.
 
Predictor Variable
Denoted, X, is also called the explanatory variable or independent variable. In the example, height is the predictor.

When there is only one predictor variable, we refer to the regression model as a simple linear regression model.

To use known information to provide a better estimate, we need to understand how the dependent and independent variables are related.

In statistics, we can describe how variables are related using a mathematical function. The function along with other assumptions is called a model. There are many models we can consider. In this class, we will focus on linear models, particularly, when there is only one predictor variable. We refer to this model as the simple linear regression model.

Objectives

Upon successful completion of this lesson, you should be able to:

  • Use plots and summary statistics to describe the relationship between the response variable and the predictor variable.
  • Perform a hypothesis test for the population correlation.
  • Find the regression equation and interpret the results.
  • Apply the regression model and know the limitations.
  • Find an interval estimate for the population slope and interpret the interval.