# 12.7 - Further Example

12.7 - Further Example## Example 12-5: Poverty and Teen Birth Rate Data

(Data source: The U.S. Census Bureau and *Mind On Statistics*, (3rd edition), Utts and Heckard). In this example, the observations are the 50 states of the United States (Poverty data - Note: remove data from the District of Columbia). The variables are *y* = percentage of each state’s population living in households with income below the federally defined poverty level in the year 2002, \(x_{1}\) = birth rate for females 15 to 17 years old in 2002, calculated as births per 1000 persons in the age group, and \(x_{2}\) = birth rate for females 18 to 19 years old in 2002, calculated as births per 1000 persons in the age group.

The two *x*-variables are correlated (so we have multicollinearity). The correlation is about 0.95. A plot of the two *x*-variables is given below.

The figure below shows plots of *y* = poverty percentage versus each *x*-variable separately. Both *x*-variables are linear predictors of the poverty percentage.

Minitab results for the two possible simple regressions and the multiple regression are given below.

### Regression Analysis: PovPct versus Brth15to17

### Regression Equation

\(\widehat{PovPct} = 4.49 + 0.387 Brth15to17\)

Predictor | Coef | SE Coef | T | P |
---|---|---|---|---|

Constant | 4.487 | 1.318 | 3.40 | 0.001 |

Brth15to17 | 0.38718 | 0.05720 | 6.77 | 0.000 |

S = 2.98209 R-Sq = 48.8% R-Sq(adj) = 47.8%

### Regression Analysis: PovPct versus Brth18to19

### Regression Equation

\(\widehat{PovPct} = 3.05 + 0.138 Brth18to19\)

Predictor | Coef | SE Coef | T | P |
---|---|---|---|---|

Constant | 3.053 | 1.832 | 1.67 | 0.102 |

Brth18to19 | 0.13842 | 0.02482 | 5.58 | 0.000 |

S = 3.24777 R-Sq = 39.3% R-Sq(adj) = 38.0%

### Regression Analysis: PovPct versus Brth15to17, Brth18to19

### Regression Equation

\(\widehat{PovPct} = 6.44 + 0.632 Brth15to17 - 0.102 Brth18to19\)

Predictor | Coef | SE Coef | T | P |
---|---|---|---|---|

Constant | 6.440 | 1.959 | 3.29 | 0.002 |

Brth15to17 | 0.6323 | 0.1918 | 3.30 | 0.002 |

Brth18to19 | -0.10227 | 0.07642 | -1.34 | 0.187 |

s = 2.95782 R-Sq = 50.7% R-Sq(adj) = 48.6%

We note the following:

- The value of the sample coefficient that multiplies a particular
*x*-variable is not the same in the multiple regression as it is in the relevant simple regression. - The \(R^{2}\) for the multiple regression is not the sum of the \(R^{2}\) values for the simple regressions. An
*x*-variable (either one) is not making an independent “add-on” in the multiple regression. - The 18 to 19-year-old birth rate variable is significant in the simple regression but is not in the multiple regression. This discrepancy is caused by the correlation between the two
*x*-variables. The 15 to 17-year-old birth rate is the stronger of the two*x*-variables and given its presence in the equation, the 18 to 19-year-old rate does not improve \(R^{2}\) enough to be significant. More specifically, the correlation between the two x-variables has increased the standard errors of the coefficients, so we have less precise estimates of the individual slopes.