Now that we know how to estimate the coefficients and perform the hypothesis test, is there any way to tell how useful the model is?
One measure is the coefficient of determination, denoted \(R^2\).
- Coefficient of Determination \(R^2\)
- The coefficient of determination measures the percentage of variability within the \(y\)-values that can be explained by the regression model.
Therefore, a value close to 100% means that the model is useful and a value close to zero indicates that the model is not useful.
It can be shown by mathematical manipulation that:
\(\text{SST }=\text{ SSR }+\text{ SSE}\)
\(\sum (y_i-\bar{y})^2=\sum (\hat{y}_i-\bar{y})^2+\sum (y_i-\hat{y}_i)^2\)
Total variability in the y value = Variability explained by the model + Unexplained variability
To get the total, explained and unexplained variability, first we need to calculate corresponding deviances. Drag the slider on the image below to see how the total deviance \((y_i-\bar{y})\) is split into explained \((\hat{y}_i-\bar{y})\) and unexplained deviances \((y_i-\hat{y}_i)\).
he breakdown of variability in the above equation holds for the multiple regression model also.
- Coefficient of Determination \(R^2\) Formula
-
\(R^2=\dfrac{\text{variability explained by the model}}{\text{total variability in the y values}}\)
\(R^2\) represents the proportion of total variability of the \(y\)-value that is accounted for by the independent variable \(x\).
For the specific case when there is only one independent variable \(X\) (i.e., simple linear regression), one can show that \(R^2 =r^2\), where \(r\) is correlation coefficient between \(X\) and \(Y\). For Bob’s data, the correlation of the two variable is 0.994 and the R2 value is 98.89.
Correlations
Pearson correlation | 0.994 |
P-value | 0.000 |
Model Summary
S | R-sq | R-sq(adj) | R-sq(pred) |
---|---|---|---|
1.05958 | 98.89% | 98.88% | 98.82% |
Minitab®
Finding Correlation Section
- Select Stat > Basic statistics > Correlation
- Specify the two (or more) variables for which you want the correlation coefficient(s) calculated.
- Pearson correlation is the default. An optional Spearman rho method is also available.
- If it isn't already checked, put a checkmark in the box labeled Display p-values by clicking once on the box.
- Select OK. The output will appear in the session window.