3.2 - Graphs: Displaying Measurement Data

Here are four different graphs that can be used to describe measurement data. These graphs include:

  1. Dotplots
  2. Stemplots (Stem and Leaf Plot)
  3. Histograms
  4. Boxplots (Covered in section 3.4)

Example 3.4. Graphs Section

Consider the following sample.

Sample: The ages of forty selected PSU Tenured Faculty (n = 40 ages)

Age Sample Dataset
45 59 51 62 58 54 42 59 49 47 52 63 40 53 61 47 54 58 53
32 61 39 51 37 43 53 46 56 58 48 55 50 57 60 54 63 60 55

Graphs Section

Dotplots

The dotplot is the first graph that will be used to display this sample of 40 ages. The purpose of the dotplot is to represent each observation as a dot. On this dotplot, you will find that the ages range from 32 to 63 years. You should also notice that there is more tenured faculty at older ages.

The dotplot displays the distribution of ages of 40 PSU tenured faculties. Every dot in the plot represents the age of each person.

Figure 3.2. Dotplot (Ages of 40 PSU Tenured Faculty)

 

Stemplots (Stem and Leaf Plot)

The second graph that is possible with measurement data is a stemplot. Stemplots concisely display the data in order from smallest to largest. Below is the list of the 40 ages in order from youngest to oldest.

Ages (Sorted)
32 37 39 40 42 43 45 46 47 47 48 49 50 51 51 52 53 53 53 54
54 54 55 55 56 56 57 58 58 58 59 59 59 60 60 61 61 62 63 63

These forty observations are displayed in the stemplot found in Figure 3.3. In this stemplot, we again find that the range of ages spans from 32 years to 63 years. We also find that there is more tenured PSU faculty at older ages. Stemplots can provide useful information about small data sets.

We can also use stem-and-leaf plot to show the ages of 40 PSU tenured faculties. The stem and leaf plot has three columns of numbers. The first column is the cumulative number of measurements. The second column is the stem number. The third column is the leaf number.

Figure 3.3. Stemplot (Ages of PSU Tenured Faculty)

 

Histograms

The third graph is called a histogram. Of all the graphs presented so far, the histogram may be the most valuable. A histogram is essentially a bar graph for measurement data. The difference, however, between a histogram and a bar graph, is that with a histogram the categories are a range of numbers rather than words. Usually, each numerical category must have the same width. The heights of the bars either reflect the frequency or the relative frequency (percents) of encountering that range of numbers in the data

Recall that the ages span from 32 to 63 years. The range of these ages is (63-32) = 31 years. Looking at the stemplot, one finds that there is more tenured faculty at the older ages. We want to be able to show this trend. There need to be enough categories to properly display any trend. If we only choose 4 categories (since we have tenured faculty in their 30s, 40s, 50s, and 60s) we would not be able to detect this trend as well. If we choose 9 categories the trend will become more obvious. Statistical software will usually make this determination for you.

Width of each Category = Range/ number of categories = 31/9 = 3.4 (rounded to 4 years)

The starting point is always below our lowest observed value and the ending point is always above our highest observed value. For readability, it is also best to have the intervals start and end on whole numbers (or easy to digest numbers if the data are on a different scale). In this instance, our starting point of 30 years is below the lowest observed age which is 32 years. The ending point of 66 years is above our highest observed age which is 63 years. In order to make sure that an observation only falls into one category, we construct our 4 year categories as shown in Table 3.1 below. Our first category includes ages starting at 30 years and ending just below 34 years. An observed age of 34 is not included in the first category but rather is included in the second category. Using this method, no observed age can fall into more than one category. The observed ages are then placed into the appropriate category and the histogram is constructed.

Recall:

Ages (Sorted)
32 37 39 40 42 43 45 46 47 47 48 49 50 51 51 52 53 53 53 54
54 54 55 55 56 56 57 58 58 58 59 59 59 60 60 61 61 62 63 63

 

Table 3.1. Summary of Ages for 40 PSU Tenured Faculty

Numerical
Category

Ages that Fall into the Category

Number of observed Ages in that Category

Percents

1. 30 ≥ Age < 34

Ages 30 to 33

1

1/40 = .025 (2.5%)

2. 34 ≥ Age < 38

Ages 34 to 37

1

1/40 = .025 (2.5%)

3. 38 ≥ Age < 42

Ages 38 to 41

2

2/40 = .05 (5.0%)

4. 42 ≥ Age < 46

Ages 42 to 45

3

3/40 = .075 (7.5%)

5. 46 ≥ Age < 50

Ages 46 to 49

5

5/40 = .125 (12.5%)

6. 50 ≥ Age < 54

Ages 50 to 53

7

7/40 = .175 (17.5%)

7. 54 ≥ Age < 58

Ages 54 to 57

8

8/40 = .20 (20.0%)

8. 58 ≥ Age < 62

Ages 58 to 61

10

10/40 = .25 (25.0%)

9. 62 ≥ Age < 66

Ages 62 to 65

3

3/40 = .075 (7.5%)

n = 40

40/40 = 1.0 (100%)

 

The information from Table 3.1 is used make the histogram found in Figure 3.4. The horizontal axis displays the categories while the vertical axis displays the percent of the observations (ages) found in each category. (Note: You will not be asked to make histograms. You will only be asked to interpret them. However, it is important to see how one is made so that you understand the interpretation.)

The histogram can be used to display the distribution of the ages of 40 PSU tenured faculties. A horizontal axis is ages and a vertical axis is percent. The width of a bar stands for a subinterval. Its height represents how many measurements fall into a given subinterval.

Figure 3.4. Histogram (Ages of PSU Tenured Faculty)

As you look at the histogram, you should notice that there are more ages in the upper half of the graph. In statistics, when data on a histogram is off-center, the data is labeled as skewed. In this case, the data is skewed to the left because a larger percent of the ages are found in the upper tail.

 

Histogram Interpretations Explanation Graph Examples
Skewed to the right

A larger percent of data is found on the lower tail of the histogram. Data values are farther apart on the right.

Common Examples: income data; variables that are ratios where the denominator could be small

Skewed to the left

A larger percent of data is found on the upper tail of the histogram. Data values are farther apart on the left.

Common Example: results of an easy test.


Symmetric

Equal percent of data on each side and tail of the histogram.

Common Example: differences between similar quantities, or sums of similar quantities

Bimodal

Two clear peaks (modes) seen in the histogram

Common Example: data really comes from two different populations