##
Example 1-2
Section* *

As discussed previously, how we summarize a set of data depends on the type of data. Let's take a look at an example. A sample of 40 female statistics students were asked how many times they cried in the previous month. Their replies were as follows:

9 | 5 | 3 | 2 | 6 | 3 | 2 | 2 | 3 | 4 | 2 | 8 | 4 | 4 |

5 | 0 | 3 | 0 | 2 | 4 | 2 | 1 | 1 | 2 | 2 | 1 | 3 | 0 |

2 | 1 | 3 | 0 | 0 | 2 | 2 | 3 | 4 | 1 | 1 | 5 |

That is, one student reported having cried nine times in the one month, while five students reported having cried not at all. It's pretty hard to draw too many conclusions about the frequency of crying for females statistics students without summarizing the data in some way.

Of course, a common way of summarizing such discrete data is by way of a **histogram**.

Here's what a **frequency histogram** of these data look like:

As you can see, a histogram gives a nice picture of the "**distribution**" of the data. And, in many ways, it's pretty self-explanatory. What are the notable features of the data? Well, the picture tells us:

- The most common number of times that the women cried in the month was two (called the "
**mode**"). - The numbers ranged from 0 to 9 (that is, the "
**range**" of the data is 9). - A majority of women (22 out of 40) cried two or fewer times, but a few cried as much as six or more times.

Can you think of anything else that the frequency histogram tells us? If we took another sample of 40 female students, would a frequency histogram of the new data look the same as the one above? No, of course not — that's what variability is all about.

Can you create a series of steps that a person would have to take in order to make a frequency histogram such as the one above? Does the following set of steps seem reasonable?

##
To create a frequency histogram of (finite) discrete data
Section* *

- Determine the number, \(n\), in the sample.
- Determine the frequency, \(f_i\), of each outcome \(i\).
- Center a rectangle with base of length 1 at each observed outcome \(i\) and make the height of the rectangle equal to the frequency.

For our crying (out loud) data, we would first tally the frequency of each outcome:

and then we'd use the first column for the horizontal-axis and the third column for the vertical-axis to draw our frequency histogram:

Well, of course, in practice, we'll not need to create histograms by hand. Instead, we'll just let statistical software (such as Minitab) create histograms for us.

Okay, so let's use the above frequency histogram to answer a few more questions:

- What percentage of the surveyed women reported not crying at all in the month?
- What percentage of the surveyed women reported crying two times in the month? and three times?

Clearly, the frequency histogram is not a 100%-user friendly. To answer these types of questions, it would be better to use a **relative frequency histogram**:

Now, the answers to the questions are a little more obvious — about 12% reported not crying at all; about 28% reported crying two times; and about 18% reported crying three times.

##
To create a relative frequency histogram of (finite) discrete data
Section* *

- Determine the number, \(n\), in the sample.
- Determine the frequency, \(f_i\), of each outcome \(i\).
- Calculate the relative frequency (proportion) of each outcome \(i\) by dividing the frequency of outcome \(i\) by the total number in the sample \(n\) — that is, calculate \(\frac{f_i}{n}\) for each outcome \(i\).
- Center a rectangle with base of length 1 at each observed outcome i and make the height of the rectangle equal to the relative frequency.

While using a relative frequency histogram to summarize discrete data is a worthwhile pursuit in and of itself, my primary motive here in addressing such histograms is to motivate the material of the course. In our example, if we

- let X = the number of times (days) a randomly selected student cried in the last month, and
- let x = 0, 1, 2, ..., 31 be the possible values

Then \(h_0=\frac{f_0}{n}\) is the relative frequency (or proportion) of students, in a sample of size \(n\), crying \(x_0\) times. You can imagine that for really small samples \(\frac{f_0}{n}\) is quite unstable (think \(n = 5\), for example). However, as the sample size \(n\) increases, \(\frac{f_0}{n}\) tends to stabilize and approach some limiting probability \(p_0=f(x_0)\) (think \(n = 1000\), for example). You can think of the relative frequency histogram serving as a sample estimate of the true probabilities of the population.

It is this \(f(x_0)\), called a (discrete) probability mass function, that will be the focus of our attention in Section 2 of this course.

##
Example 1-3
Section* *

Let's take a look at another example. The following numbers are the measured nose lengths (in millimeters) of 60 students:

38 | 50 | 38 | 40 | 35 | 52 | 45 | 50 | 40 | 32 | 40 | 47 | 70 | 55 | 51 |

43 | 40 | 45 | 45 | 55 | 37 | 50 | 45 | 45 | 55 | 50 | 45 | 35 | 52 | 32 |

45 | 50 | 40 | 40 | 50 | 41 | 41 | 40 | 40 | 46 | 45 | 40 | 43 | 45 | 42 |

45 | 45 | 48 | 45 | 45 | 35 | 45 | 45 | 40 | 45 | 40 | 40 | 45 | 35 | 52 |

How would we create a histogram for these data? The numbers look discrete, but they are technically continuous. The measuring tools, which consisted of a piece of string and a ruler, were the limiting factors in getting more refined measurements. Do you also notice that, in most cases, nose lengths come in five-millimeter increments... 35, 40, 45, 55...? Of course not, silly me... that's, again, just measurement error. In any case, if we attempted to use the guidelines for creating a histogram for discrete data, we'd soon find that the large number of disparate outcomes would prevent us from creating a meaningful summary of the data. Let's instead follow these guidelines:

##
To create a histogram of continuous data (or discrete data with many possible outcomes)
Section* *

The major difference is that you first have to group the data into a set of classes, typically of equal length. There are many, many sets of rules for defining the classes. For our purposes, we'll just rely on our common sense — having too few classes is as bad as having too many.

- Determine the number, \(n\), in the sample.
- Define \(k\) class intervals \((c_0, c_1], (c_1, c_2], ..., (c_{k-1}, c_k]\).
- Determine the frequency, \(f_i\), of each class \(i\).
- Calculate the relative frequency (proportion) of each class by dividing the class frequency by the total number in the sample — that is, \(\frac{f_i}{n}\).
- For a
**frequency histogram**: draw a rectangle for each class with the class interval as the base and the height equal to the frequency of the class. - For a
**relative frequency histogram**: draw a rectangle for each class with the class interval as the base and the height equal to the relative frequency of the class. - For a
**density histogram**: draw a rectangle for each class with the class interval as the base and the height equal to \(h(x)=\dfrac{f_i}{n(c_i-c_{i-1})}\) for \(c_{i-1}<x \leq c_i\), \(i = 1, 2,..., k\).

Here's what the work would like for our nose length example if we used 5 mm classes centered at 30, 35, ... 70:

For example, the relative frequency for the first class (27.5 to 32.5) is 2/60 or 0.033, whereas the height of the rectangle for the first class in a density histogram is 0.033/5 or 0.0066. Here is what the density histogram would like in its entirety:

Note that a density histogram is just a modified relative frequency histogram. That is, a density histogram is defined so that:

- the area of each rectangle equals the relative frequency of the corresponding class, and
- the area of the entire histogram equals 1.

Again, while using a density histogram to summarize continuous data is a worthwhile pursuit in and of itself, my primary motive here in addressing such histograms is to motivate the material of the course. As the sample size \(n\) increases, we can imagine our density histogram approaching some limiting continuous function \(f(x)\), say. It is this continuous curve \(f(x)\) that we will come to know in Section 3 as a (continuous) probability density function.

So, in Section 2, we'll learn about discrete probability mass functions (**p.m.f.**s). In Section 3, we'll learn about continuous probability density functions (**p.d.f.**s). In Section 4, we'll learn about p.m.f.s and p.d.f.s for two (random) variables (instead of one). In Section 5, we'll learn how to find the probability distribution for functions of two or more (random) variables. Wow! That's a lot of work. Before we can take it on, however, we will first spend some time in this Section 1 filling up our probability toolbox with some basic probability rules and tools.