8.5 - Coefficient of Determination

Now that we know how to estimate the coefficients and perform the hypothesis test, is there any way to tell how useful the model is?

One measure is the coefficient of determination, denoted $$R^2$$.

Coefficient of Determination $$R^2$$
The coefficient of determination measures the percentage of variability within the $$y$$-values that can be explained by the regression model.

Therefore, a value close to 100% means that the model is useful and a value close to zero indicates that the model is not useful.
It can be shown by mathematical manipulation that:

$$\text{SST }=\text{ SSR }+\text{ SSE}$$

$$\sum (y_i-\bar{y})^2=\sum (\hat{y}_i-\bar{y})^2+\sum (y_i-\hat{y}_i)^2$$

Total variability in the y value = Variability explained by the model + Unexplained variability

To get the total, explained and unexplained variability, first we need to calculate corresponding deviances. Drag the slider on the image below to see how the total deviance $$(y_i-\bar{y})$$ is split into explained $$(\hat{y}_i-\bar{y})$$ and unexplained deviances $$(y_i-\hat{y}_i)$$.

he breakdown of variability in the above equation holds for the multiple regression model also.

Coefficient of Determination $$R^2$$ Formula

$$R^2=\dfrac{\text{variability explained by the model}}{\text{total variability in the y values}}$$

$$R^2$$ represents the proportion of total variability of the $$y$$-value that is accounted for by the independent variable $$x$$.

For the specific case when there is only one independent variable $$X$$ (i.e., simple linear regression), one can show that $$R^2 =r^2$$, where $$r$$ is correlation coefficient between $$X$$ and $$Y$$. For Bob’s data, the correlation of the two variable is 0.994 and the R2 value is 98.89.

Correlations
 Pearson correlation 0.994 P-value 0