Lesson #4: Descriptive Measures of the Strength of a Linear Association

Summary

Laura -- insert summary

Comprehensive Exercises

Directions. Type up your answers to each of the following questions in a Word file named exercises04_yourPSUid.doc. Once you have completed all of the comprehensive exercises in this lesson, upload the file to the "Lesson #4 Comprehensive Exercises" dropbox.


4.1. Inappropriately combining groups can greatly affect the R2 value

The data set heightspeed.txt contains data on a sample of n = 189 Penn State students. The students heights were asked to report their height (in inches) and the fastest speed (in miles per hour) they have ever driven.

  1. Determine the correlation coefficient r between the sample of heights and fastest speeds. (See Minitab Help Section - Obtaining a sample correlation coefficient.) Is there sufficient evidence to conclude that the population correlation coefficient differs signficantly from 0? Does it make sense that a taller person would somehow be more prone to driving fast?
  2. Create a scatter plot of y = fastest versus x = heights, in which each data point denotes the gender of the individual. (See Minitab Help Section - Creating a scatter plot with each data point characterized by a third variable.) What does the plot suggest is contributing to the significance of the correlation coefficient?
  3. Now, determine what the correlation coefficient is for each subgroup, male and female. To do so:
    • Split the worksheet by gender.(See Minitab Help Section - Splitting the worksheet based on the value of a variable).
    • Determine the correlation coefficient r between thesampleof heights and fastest speeds for females. (See Minitab Help Section - Obtaining a sample correlation coefficient.) Is there sufficient evidence to conclude that the population correlation coefficient for the females differs signficantly from 0?
    • Determine the correlation coefficient r between thesampleof heights and fastest speeds for males. (See Minitab Help Section - Obtaining a sample correlation coefficient.) Is there sufficient evidence to conclude that the population correlation coefficient for the males differs signficantly from 0?
  4. Summarize the apparent contradiction you've found. What do you think is causing the contradiction?

4.2. Urbano-Marquez et al. (1989) reported on strength tests for a group of 50 alcholic men. Their daily intake of alcohol ranged from 118 to 350 grams (with a mean of 243 grams) for an average of 16 years. The total lifetime consumption of alcohol (x, in kg/kg of body weight) was determined for each person in the study. The response was strength of the deltoid muscle (y, in kg) in each person't nondominant arm. The response was determined by making five measurements over a 20-minute period, using an electric myometer, which measures force against a fixed resistance. The resulting data are stored in alcoholarm.txt.

  1. What is the value of r2 for these data?
  2. What does the r2 value tell us here? That is, write one sentence that summarizes the r2 value.
  3. What is the correlation coefficient r for these data?
  4. What does the correlation coefficient r tell us here? That is, write on sentence that summarizes the correlation coefficient.

4.3. We previously looked at the anscombe.txt data sets. Recall that Anscombe, a statistician from Yale University, created four different data sets of (x, y) data in order to illustrate the critical importance of creating scatter plots before determining the best fitting line.

  1. Create a scatter plot for each pair of (x, y) values — for (x1, y1), for (x2, y2), for (x3, y3), and (x4, y4).
  2. Determine the r2 value to quantify the degree of linear association between each pair of (x, y) values — for (x1, y1), for (x2, y2), for (x3, y3), and (x4, y4).
  3. In light of your answers to part (a) and (b), what would you advise a researcher who only reports the r2 value as a way of quantifying the degree of linear association between his or her (x, y) values?

© 2004 The Pennsylvania State University. All rights reserved.
Materials developed by Dr. Laura J. Simon (Lecturer, Penn State Department of Statistics).