11.1 - Multiple regression
11.1 - Multiple regressionCase-Study: First Article
Let’s take a look at Jin’s first article, examining the impact of several housing demographics on unemployment in mid-west towns. Specifically, the article is predicting the unemployment from three other variables.
Foundational Concepts: The key foundational concepts this article builds upon are Predicting a quantitative response variable from a quantitative predictor variable Interpreting the significance of the slope of a predictor variable Interpreting the model significance from an Analysis of Variance Table
Jin recognizes that unemployment is measured as a quantitative variable. The three predictor variables in the article are the number of senior citizens, the number of high school graduates, and the number of businesses in the town as possible predictors of unemployment. Jin recognizes some of the output as regression output, however in this format, there are multiple lines of output, one for each of the predictors.
NEED some output here
We can show Jin that each of the rows of output contains the same information as he saw in simple linear regression with one predictor. Each line contains the slope for the predictor (B1, B2, B3) and the significance of the slope. We remind Jin that the null hypothesis is the same for all three (that the slope is zero), and the regression technique uses a t-test to test the significance of the slope.
The difference with the multiple linear regression in this example is that each coefficient has a slightly different interpretation. When interpreting any one of the coefficients, we assume the other variables are held constant. Therefore we conclude that the change in unemployment changes as a function of the number of high school graduates, holding the number of senior citizens and the number of businesses constant.
Coefficients
Predictor | Coef | SE Coef | T-Value | P-Value | VIF |
---|---|---|---|---|---|
Constant | 4.285 | 0.824 | 5.20 | 0.000 | |
Number of Seniors | 0.06033 | 0.00870 | 6.93 | 0.000 | 1.49 |
Number of Businesses | -0.1315 | 0.464 | -2.84 | 0.005 | 1.48 |
Number of High School Graduates | -0.000348 | 0.000148 | -2.36 | 0.019 | 1.07 |
The F test also has a slightly different interpretation. In simple linear regression, the F test was the test that the slope was zero (just like the t test). As we learned with only one predictor, the t and f tests always came out the same. In Jin’s article, the F test is still testing the beta only now the alternative hypothesis is that at least one of the betas is not zero.
Analysis of Variance
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
---|---|---|---|---|---|
Regression | 3 | 1795.0 | 598.34 | 16.95 | 0.000 |
Number of Seniors | 1 | 1696.5 | 1696.54 | 48.07 | 0.000 |
Number of Businesses | 1 | 284.2 | 284.15 | 8.05 | 0.005 |
Number of High School Graduates | 1 | 196.9 | 196.89 | 5.58 | 0.019 |
Error | 318 | 11223.5 | |||
Total | 321 | 13018.5 |
Other than that, the assumptions are the same, so Jin will have a good understanding of how to interpret this advanced form of regression!