Now that we know how to estimate the coefficients and perform the hypothesis test, is there any way to tell how useful the model is?

One measure is the coefficient of determination, denoted \(R^2\).

Coefficient of Determination \(R^2\)

The coefficient of determination measures the percentage of variability within the \(y\)-values that can be explained by the regression model.

Therefore, a value close to 100% means that the model is useful and a value close to zero indicates that the model is not useful.

It can be shown by mathematical manipulation that:

\(\text{SST }=\text{ SSR }+\text{ SSE}\)

\(\sum (y_i-\bar{y})^2=\sum (\hat{y}_i-\bar{y})^2+\sum (y_i-\hat{y}_i)^2\)

Total variability in the y value = Variability explained by the model + Unexplained variability

To get the total, explained and unexplained variability, first we need to calculate corresponding deviances. Drag the slider on the image below to see how the total deviance \((y_i-\bar{y})\) is split into explained \((\hat{y}_i-\bar{y})\) and unexplained deviances \((y_i-\hat{y}_i)\).

The breakdown of variability in the above equation holds for the multiple regression model also.

Coefficient of Determination \(R^2\) Formula

\(R^2=\dfrac{\text{variability explained by the model}}{\text{total variability in the y values}}\)

\(R^2\) represents the proportion of total variability of the \(y\)-value that is accounted for by the independent variable \(x\).

For the specific case when there is only one independent variable \(X\) (i.e., simple linear regression), one can show that \(R^2 =r^2\), where \(r\) is correlation coefficient between \(X\) and \(Y\).

##
Example 9-6: Student height and weight (\(R^2\))
Section* *

Let's take a look at Minitab's output from the height and weight example (university_ht_wt.TXT) that we have been working with in this lesson.

#### Model Summary

S | R-sq | R-sq(adj) | R-sq(pred) |
---|---|---|---|

19.1108 | 50.57% | 48.67% | 44.09% |

##### Coefficients

Team | Coef | SE Coef | T-Value | P-Value | VIF |
---|---|---|---|---|---|

Constant | -222.5 | 72.4 | -3.07 | 0.005 | |

height | 5.49 | 1.06 | 5.16 | 0.000 | 1.00 |

##### Regression Equation

weight = -222.5 + 5.49 height

Find the coefficient of determination and interpret the value.

The coefficient of determination, \(R^2\) is 0.5057 or 50.57%. This value means that 50.57% of the variation in weight can be explained by height.

Remember, for this example we found the correlation value, \(r\), to be 0.711.

So, we can now see that \(r^2 = (0.711)^2 = .506\) which is the same reported for R-sq in the Minitab output.

##
Try it!
Section* *

#### Used car sales continued...

For the age and price of the car example (cars_sold.txt), what is the value of the coefficient of determination and interpret the value in the context of the problem?

#### Model Summary

S | R-sq | R-sq(adj) | R-sq(pred) |
---|---|---|---|

503.146 | 88.39% | 87.67% | 84.41% |

#### Coefficients

Team | Coef | SE Coef | T-Value | P-Value | VIF |
---|---|---|---|---|---|

Constant | 7850 | 362 | 21.70 | 0.000 | |

age | -485.0 | 43.9 | -11.04 | 0.000 | 1.00 |

#### Regression Equation

price = 7850 - 485.0 age

From the Minitab output, we see an R-sq value of 88.39%. We want to report this in terms of the study, so here we would say that 88.39% of the variation in vehicle price is explained by the age of the vehicle.

**Note!**The two other references to R-sq, (adj) and (pred), are used for model comparisons. These two metrics do not provide any interpretive value to the model in regards to X and Y.