5.8 - Partial R-squared

Suppose we have set up a general linear F-test. Then, we may be interested in seeing what percent of the variation in the response cannot be explained by the predictors in the reduced model (i.e., the model specified by $H_{0}$), but can be explained by the rest of the predictors in the full model. If we obtain a large percentage, then it is likely we would want to specify some or all of the remaining predictors to be in the final model since they explain so much variation.

The way we formally define this percentage is by what is called the partial $\textbf{R}^{2}$ (or it is also called the coefficient of partial determination). Specifically, suppose we have three predictors: $x_{1}$, $x_{2}$, and $x_{3}$. For the corresponding multiple regression model (with response y), we wish to know what percent of the variation not explained by $x_{1}$ is explained by $x_{2}$ and $x_{3}$. In other words, given $x_{1}$, what additional percent of the variation can be explained by $x_{2}$ and $x_{3}$? Note that here the full model will include all three predictors, while the reduced model will only include $x_{1}$.

Define $\textrm{SSR}(x_{2},x_{3}|x_{1}) = \textrm{SSR}(x_{1},x_{2},x_{3})-\textrm{SSR}(x_{1})$ to be the increase in the regression sum of squares when $x_{2}$ and $x_{3}$ are added to the model with only $x_{1}$. The vertical bar "|" is read as "given," so "$x_{2},x_{3}|x_{1}$" is read "$x_{2},x_{3}$ given $x_{1}$."

After obtaining the relevant ANOVA tables for the full and reduced models, the partial $R^{2}$ is as follows:

\begin{align*} R^{2}_{y,2,3|1}&=\frac{\textrm{SSR}(x_{2},x_{3}|x_{1})}{\textrm{SSE}(x_{1})} \\ &=\frac{\textrm{SSR}(x_{1},x_{2},x_{3})-\textrm{SSR}(x_{1})}{\textrm{SSE}(x_{1})}\\ &=\frac{(\textrm{SSTO}-\textrm{SSE}(x_{1},x_{2},x_{3}))-(\textrm{SSTO}-\textrm{SSE}(x_{1}))}{\textrm{SSE}(x_{1})}\\ &=\frac{\textrm{SSE}(x_{1})-\textrm{SSE}(x_{1},x_{2},x_{3})}{\textrm{SSE}(x_{1})}\\ &=\frac{\textrm{SSE(reduced)}-\textrm{SSE(full)}}{\textrm{SSE(reduced)}}. \end{align*}

Then, this gives us the proportion of variation explained by $x_{2}$ and $x_{3}$ that cannot be explained by $x_{1}$. Note that the last line of the above equation is just demonstrating that the partial $R^{2}$ has a similar form to regular $R^{2}$.

For the rabbit heart attacks example,

$\textrm{SSR}(x_{2},x_{3}|x_{1})=\textrm{SSR}(x_1,x_{2},x_{3})-\textrm{SSR}(x_{1})=0.95927-0.62492=0.33435$

$\textrm{SSE}(x_{1})=\textrm{SSTO}-\textrm{SSR}(x_{1})=1.50418-0.62492=0.87926,$

so $R^{2}_{y,2,3|1}=0.33435/0.87926=0.38$. In other words, $x_{2}$ and $x_{3}$ explain 38% of the variation in $y$ that cannot be explained by $x_{1}$.

More generally, consider partitioning the predictors $x_{1},x_{2},\ldots,x_{k}$ into two groups, A and B, containing u and k - u predictors, respectively. The proportion of variation explained by the predictors in group B that cannot be explained by the predictors in group A is given by

\begin{align*} R^{2}_{y,B|A}&=\frac{\textrm{SSR}(B|A)}{\textrm{SSE}(A)} \\ &=\frac{\textrm{SSE}(A)-\textrm{SSE}(A,B)}{\textrm{SSE}(A)}. \end{align*}