3.4.1 - Scatterplots

Recall from Lesson 1.1.2, in some research studies one variable is used to predict or explain differences in another variable. In those cases, the explanatory variable is used to predict or explain differences in the response variable.

Explanatory variable

Variable that is used to explain variability in the response variable, also known as an independent variable or predictor variable; in an experimental study, this is the variable that is manipulated by the researcher.

Response variable

The outcome variable, also known as a dependent variable.

A scatterplot can be used to display the relationship between the explanatory and response variables. Or, a scatterplot can be used to examine the association between two variables in situations where there is not a clear explanatory and response variable. For example, we may want to examine the relationship between height and weight in a sample but have no hypothesis as to which variable impacts the other; in this case, it does not matter which variable is on the x-axis and which is on the y-axis.

Scatterplot
A graphical representation of two quantitative variables in which the explanatory variable is on the x-axis and the response variable is on the y-axis.

When examining a scatterplot, we need to consider the following:

  1. Direction (positive or negative)
  2. Form (linear or non-linear) 
  3. Strength (weak, moderate, strong)
  4. Bivariate outliers

In this class, we will focus on linear relationships. This occurs when the line-of-best-fit for describing the relationship between x and y is a straight line. The linear relationship between two variables is positive when both increase together; in other words, as values of x get larger values of y get larger. This is also known as a direct relationship. The linear relationship between two variables is negative when one increases as the other decreases. For example, as values of x get larger values of y get smaller. This is also known as an indirect relationship.

A bivariate outlier is an observation that does not fit with the general pattern of the other observations. 

Example: Baseball Section

Baseball

Data concerning baseball statistics and salaries from the 1991 and 1992 seasons is available at:

The scatterplot below shows the relationship between salary and batting average for the 337 baseball players in this sample.

0 1000000 2000000 3000000 4000000 5000000 6000000 0.5 0.4 0.3 0.2 0.1 Scatterplot of Batting Average vs Salary Salary Batting Average

 

From this scatterplot, we can see that there does not appear to be a meaningful relationship between baseball players' salaries and batting averages. We can also see that more players had salaries at the low end and fewer had salaries at the high end.

Example: Height and Shoe Size Section

Shoes

Data concerning the heights and shoe sizes of 408 students were retrieved from:

The scatterplot below was constructed to show the relationship between height and shoe size.

5.0 7.5 10.0 12.5 15.0 60 65 70 75 80 Height Scatterplot of Shoe Size vs Height Shoe Size

 

There is a positive linear relationship between height and shoe size in this sample. The magnitude of the relationship appears to be strong. There do not appear to be any outliers. 

Example: Height and Weight Section

Data concerning body measurements from 507 individuals retrieved from:

For more information see:

The scatterplot below shows the relationship between height and weight.

40 50 60 70 80 90 100 110 120 140 150 160 170 180 190 200 Height (cm) Scatterplot of Weight (kg) vs Height (cm) Weight (kg)

 

There is a positive linear relationship between height and weight. The magnitude of the relationship is moderately strong.

Example: Cafés Section

Cafe

Data concerning sales at student-run café were retrieved from:

For more information about this data set, visit:

The scatterplot below shows the relationship between maximum daily temperature and coffee sales.

0 10 20 30 40 50 20 30 40 50 60 70 80 Max Daily Temperature (F) Scatterplot of Coffees vs Max Daily Temperature (F) Coffees

 

There is a negative linear relationship between the maximum daily temperature and coffee sales. The magnitude is moderately strong. There do not appear to be any outliers.