Lesson 8: Mathematical Expectation

Overview

In this lesson, we learn a general definition of mathematical expectation, as well as some specific mathematical expectations, such as the mean and variance.

Objectives

Upon completion of this lesson, you should be able to:

To get a general understanding of the mathematical expectation of a discrete random variable.
To learn a formal definition of $E[u(X)]$, the expected value of a function of a discrete random variable.
To understand that the expected value of a discrete random variable may not exist.
To learn and be able to apply the properties of mathematical expectation.
To learn a formal definition of the mean of a discrete random variable.
To derive a formula for the mean of a hypergeometric random variable.
To learn a formal definition of the variance and standard deviation of a discrete random variable.
To learn and be able to apply a shortcut formula for the variance of a discrete random variable.
To be able to calculate the mean and variance of a linear function of a discrete random variable.
To learn a formal definition of the sample mean and sample variance.
To learn and be able to apply a shortcut formula for the sample variance.
To understand the steps involved in each of the proofs in the lesson.
To be able to apply the methods learned in the lesson to new problems.

8.1 - A Definition

Example 8-1

Toss a fair, six-sided die many times. In the long run (do you notice that it is bolded and italicized?!), what would the average (or "mean") of the tosses be? That is, if we have the following, for example:

1	3	2	4	5	6	5	4	4	1	2	3
2	1	3	6	4	5	5	4	3	1	6	6
2	5	3	1	2	4	6	2	5	6	3	1

what is the average of the tosses?

This example lends itself to a couple of notes.

In reality, one-sixth of the tosses will equal $x$ only in the long run (there's that bolding again).
The mean is a weighted average, that is, an average of the values weighted by their respective individual probabilities.
The mean is called the expected value of $X$, denoted $E(X)$ or by $\mu$, the greek letter mu (read "mew").

Let's give a formal definition.

Mathematical Expectation

If $f(x)$ is the p.m.f. of the discrete random variable $X$ with support $S$, and if the summation:

$\sum\limits_{x\in S}u(x)f(x)$

exists (that is, it is less than $\infty$), then the resulting sum is called the mathematical expectation, or the expected value of the function $u(X)$. The expectation is denoted $E[u(X)]$. That is:

$E[u(X)]=\sum\limits_{x\in S}u(x)f(x)$

Example 8-2

What is the average toss of a fair six-sided die?

Solution

If the random variable $X$ is the top face of a tossed, fair, six-sided die, then the p.m.f. of $X$ is:

$f(x)=\dfrac{1}{6}$

for $x=1, 2, 3, 4, 5, \text{and } 6$. Therefore, the average toss, that is, the expected value of $X$, is:

$E(X)=1\left(\dfrac{1}{6}\right)+2\left(\dfrac{1}{6}\right)+3\left(\dfrac{1}{6}\right)+4\left(\dfrac{1}{6}\right)+5\left(\dfrac{1}{6}\right)+6\left(\dfrac{1}{6}\right)=3.5$

Hmm... if we toss a fair, six-sided die once, should we expect the toss to be 3.5? No, of course not! All the expected value tells us is what we would expect the average of a large number of tosses to be in the long run. If we toss a fair, six-sided die a thousand times, say, and calculate the average of the tosses, will the average of the 1000 tosses be exactly 3.5? No, probably not! But, we can certainly expect it to be close to 3.5. It is important to keep in mind that the expected value of $X$ is a theoretical average, not an actual, realized one!

Example 8-3

Hannah's House of Gambling has a roulette wheel containing 38 numbers: zero (0), double zero (00), and the numbers 1, 2, 3, ..., 36. Let $X$ denote the number on which the ball lands and $u(X)$ denote the amount of money paid to the gambler, such that:

\begin{array}{lcl} u(X) &=& \$5 \text{ if } X=0\\ u(X) &=& \$10 \text{ if } X=00\\ u(X) &=& \$1 \text{ if } X \text{ is odd}\\ u(X) &=& \$2 \text{ if } X \text{ is even} \end{array}

How much would I have to charge each gambler to play in order to ensure that I made some money?

Solution

Assuming that the ball has an equally likely chance of landing on each number, the p.m.f of $X$ is:

$f(x)=\dfrac{1}{38}$

for $x=0, 00, 1, 2, 3, \ldots, 36$. Therefore, the expected value of $u(X)$ is:

$E(u(X))=\$5\left(\dfrac{1}{38}\right)+\$10\left(\dfrac{1}{38}\right)+\left[\$1\left(\dfrac{1}{38}\right)\times 18 \right]+\left[\$2\left(\dfrac{1}{38}\right)\times 18 \right]=\$1.82$

Note that the 18 that is multiplied by the \$1 and \$2 is because there are 18 odd and 18 even numbers on the wheel. Our calculation tells us that, in the long run, Hannah's House of Gambling would expect to have to pay out \$1.82 for each spin of the roulette wheel. Therefore, in order to ensure that the House made money, the House would have to charge at least \$1.82 per play.

Example 8-4

Imagine a game in which, on any play, a player has a 20% chance of winning \$3 and an 80% chance of losing \$1. The probability mass function of the random variable $X$, the amount won or lost on a single play is:

x	\$3	-\$1
f(x)	0.2	0.8

and so the average amount won (actually lost, since it is negative) — in the long run — is:

$E(X)=(\$3)(0.2)+(-\$1)(0.8)=\$-0.20$

What does "in the long run" mean? If you play, are you guaranteed to lose no more than 20 cents?

Solution

If you play and lose, you are guaranteed to lose \$1! An expected loss of 20 cents means that if you played the game over and over and over and over .... again, the average of your \$3 winnings and your \$1 losses would be a 20 cent loss. "In the long run" means that you can't draw conclusions about one or two plays, but rather thousands and thousands of plays.

Example 8-5

What is the expected value of a discrete random variable $X$ with the following probability mass function:

$f(x)=\dfrac{c}{x^2}$

where $c$ is a constant and the support is $x=1, 2, 3, \ldots$?

Solution

The expected value is calculated as follows:

$E(X)=\sum\limits_{x=1}^\infty xf(x)=\sum\limits_{x=1}^\infty x\left(\dfrac{c}{x^2}\right)=c\sum\limits_{x=1}^\infty \dfrac{1}{x}$

The first equal sign arises from the definition of the expected value. The second equal sign just involves replacing the generic p.m.f. notation $f(x)$ with the given p.m.f. And, the third equal sign is because the constant $c$ can be pulled through the summation sign, because it does not depend on the value of $x$.

Now, to finalize our calculation, all we need to do is determine what the summation:

$\sum\limits_{x=1}^\infty \dfrac{1}{x}$

equals. Oops! You might recognize this quantity from your calculus studies as the divergent harmonic series, whose sum is infinity. Therefore, as the above definition of expectation suggests, we say in this case that the expected value of $X$ doesn't exist.

This is the first example where the summation is not absolutely convergent. That is, we cannot get a finite answer here. The expectation for a random variable may not always exist. In this course, we will not encounter nonexistent expectations very often. However, when you encounter more sophisticated distributions in your future studies, you may find that the expectation does not exist.

8.2 - Properties of Expectation

Example 8-6

Suppose the p.m.f. of the discrete random variable $X$ is:

x	0	1	2	3
f(x)	0.2	0.1	0.4	0.3

What is $E(2)$? What is $E(X)$? And, what is $E(2X)$?

This example leads us to a very helpful theorem.

Theorem

When it exists, the mathematical expectation $E$ satisfies the following properties:

If $c$ is a constant, then $E(c)=c$
If $c$ is a constant and $u$ is a function, then:

$E[cu(X)]=cE[u(X)]$

Proof

Example 8-7

Let's return to the same discrete random variable $X$. That is, suppose the p.m.f. of the random variable $X$ is:

x	0	1	2	3
f(x)	0.2	0.1	0.4	0.3

It can be easily shown that $E(X^2)=4.4$. What is $E(2X+3X^2)$?

This example again leads us to a very helpful theorem.

Theorem

Let $c_1$ and $c_2$ be constants and $u_1$ and $u_2$ be functions. Then, when the mathematical expectation $E$ exists, it satisfies the following property:

$E[c_1 u_1(X)+c_2 u_2(X)]=c_1E[u_1(X)]+c_2E[u_2(X)]$

Before we look at the proof, it should be noted that the above property can be extended to more than two terms. That is:

$E\left[\sum\limits_{i=1}^k c_i u_i(X)\right]=\sum\limits_{i=1}^k c_i E[u_i(X)]$

Proof

Example 8-8

Suppose the p.m.f. of the discrete random variable $X$ is:

x	0	1	2	3
f(x)	0.2	0.1	0.4	0.3

In the previous examples, we determined that $E(X)=1.8$ and $E(X^2)=4.4$. Knowing that, what is $E(4X^2)$ and $E(3X+2X^2)$?

Using part (b) of the first theorem, we can determine that:

$E(4X^2)=4E(X^2)=4(4.4)=17.6$

And using the second theorem, we can determine that:

$E(3X+2X^2)=3E(X)+2E(X^2)=3(1.8)+2(4.4)=14.2$

Example 8-9

Let $u(X)=(X-c)^2$ where $c$ is a constant. Suppose $E[(X-c)^2]$ exists. Find the value of $c$ that minimizes $E[(X-c)^2]$.

Note that the expectations $E(X)$ and $E[(X-E(X))^2]$ are so important that they deserve special attention.

8.3 - Mean of X

In the previous pages, we concerned ourselves with finding the expectation of any general function $u(X)$ of the discrete random variable $X$. Here, we'll focus our attention on one particular function, namely:

$u(X)=X$

Let's jump right in, and give the expectation in this situation a special name!

First Moment about the Origin: When the function $u(X)=X$, the expectation of $u(X)$, when it exists:

$E[u(X)]=E(X)=\sum\limits_{x\in S} xf(x) $

is called the expected value of $X$, and is denoted $E(X)$. Or, it is called the mean of $X$, and is denoted as $\mu$ (the greek letter mu, read "mew"). That is, $\mu=E(X)$. The expected value of $X$ can also be called the first moment about the origin.

Example 8-10

The maximum patent life for a new drug is 17 years. Subtracting the length of time required by the Food and Drug Administration for testing and approval of the drug provides the actual patent life for the drug — that is, the length of time that the company has to recover research and development costs and to make a profit. The distribution of the lengths of actual patent lives for new drugs is as follows:

Years, y	3	4	5	6	7	8	9	10	11	12	13
f(y)	0.03	0.05	0.07	0.10	0.14	0.20	0.18	0.12	0.07	0.03	0.01

What is the mean patent life for a new drug?

Answer The mean can be calculated as:

$\mu_Y=E(Y)=\sum\limits_{y=3}^{13} yf(y)=3(0.03)+4(0.05)+\cdots+12(0.03)+13(0.01)=7.9$

That is, the average patent life for a new drug is 7.9 years.

Example 8-11

Let $X$ follow a hypergeometric distribution in which n objects are selected from $N$ objects with $m$ of the objects being one type, and $N-m$ of the objects being a second type. What is the mean of $X$?

Solution

Recalling the p.m.f. of a hypergeometric distribution and using the definition of the expected value of $X$, we have:

$E(X)=\sum\limits_{x\in S} x \dfrac{\dbinom{m}{x} \dbinom{N-m}{n-x}}{\dbinom{N}{n}}$

You should be getting the idea already that this is going to be messy! So, we're going to work on it in parts. First, note that the first term of the summation equals 0 when $x=0$. And, note that some of the terms can be written differently:

That is:

$\dbinom{m}{x}=\dfrac{m!}{x!(m-x)!}$

and:

$\dbinom{N}{n}=\dfrac{N!}{n!(N-n)!}=\dfrac{N(N-1)!}{n \cdot (n-1)!(N-n)!}=\dfrac{N}{n} \cdot \dfrac{(N-1)!}{(n-1)!(N-1-(n-1))!}=\dfrac{N}{n} \cdot \dbinom{N-1}{n-1}$

Therefore, replacing these quantities in our formula for $E(X)$, we have:

My voice gets caught off at the end there, but we still managed to finish the proof in the nick of time! We've shown that, in general, the mean of a hypergeometric random variable $X$, in which $n$ objects are selected from $N$ objects with $m$ of the objects being one type, is:

$E(X)=\dfrac{mn}{N}$

Example 8-12

Suppose the random variable $X$ follows the uniform distribution on the first $m$ positive integers. That is, suppose the p.m.f. of $X$ is:

$f(x)=\dfrac{1}{m}$ for $x=1, 2, 3, \ldots, m$

What is the mean of $X$?

8.4 - Variance of X

Example 8-13

Consider two probability mass functions. The first:

x	3	4	5
f(x)	0.3	0.4	0.3

And, the second:

y	1	2	6	8
f(y)	0.4	0.1	0.3	0.2

It is a straightforward calculation to show that the mean of $X$ and the mean of $Y$ are the same:

$\mu_X=E(X) = 3(0.3)+4(0.4)+5(0.3)=4$

$\mu_Y=E(Y)=1(0.4)+2(0.1)+6(0.3)+8(0.2)=4$

Let's draw a picture that illustrates the two p.m.f.s and their means.

Again, the pictures illustrate (at least) two things:

The $X$ and $Y$ means are at the fulcrums in which their axes don't tilt ("a balanced seesaw").
The second p.m.f. exhibits greater variability than the first p.m.f.

That second point suggests that the means of $X$ and $Y$ are not sufficient in summarizing their probability distributions. Hence, the following definition!

Definition. When $u(X)=(X-\mu)^2$, the expectation of $u(X)$:

$E[u(X)]=E[(X-\mu)^2]=\sum\limits_{x\in S} (x-\mu)^2 f(x)$

is called the variance of $X$, and is denoted as $\text{Var}(X)$ or $\sigma^2$ ("sigma-squared"). The variance of $X$ can also be called the second moment of $X$ about the mean $\mu$.

The positive square root of the variance is called the standard deviation of $X$, and is denoted $\sigma$ ("sigma"). That is:

$\sigma=\sqrt{Var(X)}=\sqrt{\sigma^2}$

Although most students understand that $\mu=E(X)$ is, in some sense, a measure of the middle of the distribution of $X$, it is much more difficult to get a feeling for the meaning of the variance and the standard deviation. The next example (hopefully) illustrates how the variance and standard deviation quantifies the spread or dispersion of the values in the support $S$.

Example 8-14

Let's return to the probability mass functions of the previous example. The first:

x	3	4	5
f(x)	0.3	0.4	0.3

And, the second:

y	1	2	6	8
f(y)	0.4	0.1	0.3	0.2

What is the variance and standard deviation of $X$? How does it compare to the variance and standard deviation of $Y$?

Solution

The variance of $X$ is calculated as:

$\sigma^2_X=E[(X-\mu)^2]=(3-4)^2(0.3)+(4-4)^2(0.4)+(5-4)^2(0.3)=0.6$

And, therefore, the standard deviation of $X$ is:

$\sigma_X=\sqrt{0.6}=0.77$

Now, the variance of $Y$ is calculated as:

$\sigma_Y^2=E[(Y-\mu)^2]=(1-4)^2(0.4)+(2-4)^2(0.1)+(6-4)^2(0.3)+(8-4)^2(0.2)=8.4$

And, therefore, the standard deviation of $Y$ is:

$\sigma_Y=\sqrt{8.4}=2.9$

As you can see, the expected variation in the random variable $Y$, as quantified by its variance and standard deviation, is much larger than the expected variation in the random variable $X$. Given the p.m.f.s of the two random variables, this result should not be surprising.

As you might have noticed, the formula for the variance of a discrete random variable can be quite cumbersome to use. Fortunately, there is a slightly easier-to-work-with alternative formula.

Theorem

An easier way to calculate the variance of a random variable $X$ is:

$\sigma^2=Var(X)=E(X^2)-\mu^2$

Proof

Example 8-15

Use the alternative formula to verify that the variance of the random variable $X$ with the following probability mass function:

x	3	4	5
f(x)	0.3	0.4	0.3

is 0.6, as we calculated earlier.

Solution

First, we need to calculate the expected value of $X^2$:

$E(X^2)=3^2(0.3)+4^2(0.4)+5^2(0.3)=16.6$

Earlier, we determined that $\mu$, the mean of $X$, is 4. Therefore, using the shortcut formula for the variance, we verify that indeed the variance of $X$ is 0.6:

$\sigma^2_X=E(X^2)-\mu^2=16.6-4^2=0.6$

Example 8-16

Suppose the random variable $X$ follows the uniform distribution on the first $m$ positive integers. That is, suppose the p.m.f. of $X$ is:

$f(x)=\dfrac{1}{m}$ for $x=1, 2, 3, \ldots, m$

What is the variance of $X$?

Solution

On the previous page, we determined that the mean of the discrete uniform random variable $X$ is:

$\mu=E(X)=\dfrac{m+1}{2}$

If we can calculate $E(X^2)$, we can use the shortcut formula to calculate the variance of $X$. Let's do that:

The following theorem can be useful in calculating the mean and variance of a random variable $Y$ that is a linear function of a random variable $X$.

Theorem

If the mean and variance of the random variable $X$ is:

$\mu_X$ and $\sigma^2_X$

respectively, then the mean, variance and standard deviation of the random variable $Y=aX+b$ is:

\begin{array}{lcl} \mu_Y &=& a\mu_X+b\\ \sigma^2_Y &=& a^2 \sigma^2_X\\ \sigma_Y &=& |a|\sigma_X \end{array}

Proof

Example 8-17

The mean temperature in Victoria, B.C. is 50 degrees Fahrenheit with standard deviation 8 degrees Fahrenheit. What is the mean temperature in degrees Celsius? What is the standard deviation in degrees Celsius?

Solution

First, recall that the conversion from Fahrenheit (F) to Celsius (C) is:

$C=\dfrac{5}{9}(F-32)$

Therefore, the mean temperature in degrees Celsius is calculated as:

$\mu_C=E(C)=E\left[\dfrac{5}{9}F-\dfrac{160}{9}\right]= \dfrac{5}{9}E(F)-\dfrac{160}{9}=\dfrac{5}{9}(50)-\dfrac{160}{9}=\dfrac{250-160}{9}=\dfrac{90}{9}=10$

And, the standard deviation in degrees Celsius is calculated as:

$\sigma_C=|\dfrac{5}{9}|\sigma_F=\dfrac{5}{9}(8)=\dfrac{40}{9}=4.44$

8.5 - Sample Means and Variances

Let's now spend some time clarifying the distinction between a population mean and a sample mean, and between a population variance and a sample variance.

Situation

Suppose we are interested in determining $\mu$, the mean number of hours slept nightly by American college students. Because the population of American college students is so large, we can't possibly record the number of hours slept by each American college student.

How can we determine the value of the population mean if the population is too large to measure it?

We could take a random sample of American college students, calculate the average for the students in the sample, and use that sample mean as an estimate of the population mean. Similarly, we could calculate the sample variance and use it to estimate the population variance $\sigma^2$

Let's take a look!

Now, all we need to do is define the sample mean and sample variance!

Sample Mean

The sample mean, denoted $\bar{x}$ and read “x-bar,” is simply the average of the $n$ data points $x_1, x_2, \ldots, x_n$:

$\bar{x}=\dfrac{x_1+x_2+\cdots+x_n}{n}=\dfrac{1}{n} \sum\limits_{i=1}^n x_i$

The sample mean summarizes the "location" or "center" of the data.

Example 8-18

A random sample of 10 American college students reported sleeping 7, 6, 8, 4, 2, 7, 6, 7, 6, 5 hours, respectively. What is the sample mean?

Solution

The sample mean is:

$\bar{x}=\dfrac{7+6+8+4+2+7+6+7+6+5}{10}=5.8$

Sample Variance

The sample variance, denoted $s^2$ and read "s-squared," summarizes the "spread" or "variation" of the data:

$s^2=\dfrac{(x_1-\bar{x})^2+(x_2-\bar{x})^2+\cdots+(x_n-\bar{x})^2}{n-1}=\dfrac{1}{n-1}\sum\limits_{i=1}^n (x_i-\bar{x})^2$

Sample Standard Deviation

The sample standard deviation, denoted $s$ is simply the positive square root of the sample variance. That is:

$s=\sqrt{s^2}$

Example 8-19

A random sample of 10 American college students reported sleeping 7, 6, 8, 4, 2, 7, 6, 7, 6, 5 hours, respectively. What is the sample standard deviation?

Solution

The sample variance is:

$s^2=\dfrac{1}{9}\left[(7-5.8)^2+(6-5.8)^2+\cdots+(5-5.8)^2\right]=\dfrac{1}{9}(27.6)=3.067$

Therefore, the sample standard deviation is:

$s=\sqrt{3.067}=1.75$

Theorem

An easier way to calculate the sample variance is:

$s^2=\dfrac{1}{n-1}\left[\sum\limits_{i=1}^n x^2_i-n{\bar{x}}^2\right]$

Proof

Example 8-20

A random sample of 10 American college students reported sleeping 7, 6, 8, 4, 2, 7, 6, 7, 6, 5 hours, respectively. What is the sample standard deviation?

Solution

The sample variance is:

$s^2=\dfrac{1}{9}\left[(7^2+6^2+\cdots+6^2+5^2)-10(5.8)^2\right]=3.067$

Therefore, the sample standard deviation is:

$s=\sqrt{3.067}=1.75$

We will get a better feel for what the sample standard deviation tells us later on in our studies. For now, you can roughly think of it as the average distance of the data values $x_1, x_2, \ldots, x_n$ from their sample mean.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility