Lesson 8: Mathematical Expectation
Lesson 8: Mathematical ExpectationOverview
In this lesson, we learn a general definition of mathematical expectation, as well as some specific mathematical expectations, such as the mean and variance.
Objectives
 To get a general understanding of the mathematical expectation of a discrete random variable.
 To learn a formal definition of \(E[u(X)]\), the expected value of a function of a discrete random variable.
 To understand that the expected value of a discrete random variable may not exist.
 To learn and be able to apply the properties of mathematical expectation.
 To learn a formal definition of the mean of a discrete random variable.
 To derive a formula for the mean of a hypergeometric random variable.
 To learn a formal definition of the variance and standard deviation of a discrete random variable.
 To learn and be able to apply a shortcut formula for the variance of a discrete random variable.
 To be able to calculate the mean and variance of a linear function of a discrete random variable.
 To learn a formal definition of the sample mean and sample variance.
 To learn and be able to apply a shortcut formula for the sample variance.
 To understand the steps involved in each of the proofs in the lesson.
 To be able to apply the methods learned in the lesson to new problems.
8.1  A Definition
8.1  A DefinitionExample 81
Toss a fair, sixsided die many times. In the long run (do you notice that it is bolded and italicized?!), what would the average (or "mean") of the tosses be? That is, if we have the following, for example:
what is the average of the tosses?
This example lends itself to a couple of notes.
 In reality, onesixth of the tosses will equal \(x\) only in the long run (there's that bolding again).
 The mean is a weighted average, that is, an average of the values weighted by their respective individual probabilities.
 The mean is called the expected value of \(X\), denoted \(E(X)\) or by \(\mu\), the greek letter mu (read "mew").
Let's give a formal definition.
 Mathematical Expectation

If \(f(x)\) is the p.m.f. of the discrete random variable \(X\) with support \(S\), and if the summation:
\(\sum\limits_{x\in S}u(x)f(x)\)
exists (that is, it is less than \(\infty\)), then the resulting sum is called the mathematical expectation, or the expected value of the function \(u(X)\). The expectation is denoted \(E[u(X)]\). That is:
\(E[u(X)]=\sum\limits_{x\in S}u(x)f(x)\)
Example 82
What is the average toss of a fair sixsided die?
Solution
If the random variable \(X\) is the top face of a tossed, fair, sixsided die, then the p.m.f. of \(X\) is:
\(f(x)=\dfrac{1}{6}\)
for \(x=1, 2, 3, 4, 5, \text{and } 6\). Therefore, the average toss, that is, the expected value of \(X\), is:
\(E(X)=1\left(\dfrac{1}{6}\right)+2\left(\dfrac{1}{6}\right)+3\left(\dfrac{1}{6}\right)+4\left(\dfrac{1}{6}\right)+5\left(\dfrac{1}{6}\right)+6\left(\dfrac{1}{6}\right)=3.5\)
Hmm... if we toss a fair, sixsided die once, should we expect the toss to be 3.5? No, of course not! All the expected value tells us is what we would expect the average of a large number of tosses to be in the long run. If we toss a fair, sixsided die a thousand times, say, and calculate the average of the tosses, will the average of the 1000 tosses be exactly 3.5? No, probably not! But, we can certainly expect it to be close to 3.5. It is important to keep in mind that the expected value of \(X\) is a theoretical average, not an actual, realized one!
Example 83
Hannah's House of Gambling has a roulette wheel containing 38 numbers: zero (0), double zero (00), and the numbers 1, 2, 3, ..., 36. Let \(X\) denote the number on which the ball lands and \(u(X)\) denote the amount of money paid to the gambler, such that:
\begin{array}{lcl} u(X) &=& \$5 \text{ if } X=0\\ u(X) &=& \$10 \text{ if } X=00\\ u(X) &=& \$1 \text{ if } X \text{ is odd}\\ u(X) &=& \$2 \text{ if } X \text{ is even} \end{array}
How much would I have to charge each gambler to play in order to ensure that I made some money?
Solution
Assuming that the ball has an equally likely chance of landing on each number, the p.m.f of \(X\) is:
\(f(x)=\dfrac{1}{38}\)
for \(x=0, 00, 1, 2, 3, \ldots, 36\). Therefore, the expected value of \(u(X)\) is:
\(E(u(X))=\$5\left(\dfrac{1}{38}\right)+\$10\left(\dfrac{1}{38}\right)+\left[\$1\left(\dfrac{1}{38}\right)\times 18 \right]+\left[\$2\left(\dfrac{1}{38}\right)\times 18 \right]=\$1.82\)
Note that the 18 that is multiplied by the \$1 and \$2 is because there are 18 odd and 18 even numbers on the wheel. Our calculation tells us that, in the long run, Hannah's House of Gambling would expect to have to pay out \$1.82 for each spin of the roulette wheel. Therefore, in order to ensure that the House made money, the House would have to charge at least \$1.82 per play.
Example 84
Imagine a game in which, on any play, a player has a 20% chance of winning \$3 and an 80% chance of losing \$1. The probability mass function of the random variable \(X\), the amount won or lost on a single play is:
x  \$3  \$1 
f(x)  0.2  0.8 
and so the average amount won (actually lost, since it is negative) — in the long run — is:
\(E(X)=(\$3)(0.2)+(\$1)(0.8)=\$0.20\)
What does "in the long run" mean? If you play, are you guaranteed to lose no more than 20 cents?
Solution
If you play and lose, you are guaranteed to lose \$1! An expected loss of 20 cents means that if you played the game over and over and over and over .... again, the average of your \$3 winnings and your \$1 losses would be a 20 cent loss. "In the long run" means that you can't draw conclusions about one or two plays, but rather thousands and thousands of plays.
Example 85
What is the expected value of a discrete random variable \(X\) with the following probability mass function:
\(f(x)=\dfrac{c}{x^2}\)
where \(c\) is a constant and the support is \(x=1, 2, 3, \ldots\)?
Solution
The expected value is calculated as follows:
\(E(X)=\sum\limits_{x=1}^\infty xf(x)=\sum\limits_{x=1}^\infty x\left(\dfrac{c}{x^2}\right)=c\sum\limits_{x=1}^\infty \dfrac{1}{x}\)
The first equal sign arises from the definition of the expected value. The second equal sign just involves replacing the generic p.m.f. notation \(f(x)\) with the given p.m.f. And, the third equal sign is because the constant \(c\) can be pulled through the summation sign, because it does not depend on the value of \(x\).
Now, to finalize our calculation, all we need to do is determine what the summation:
\(\sum\limits_{x=1}^\infty \dfrac{1}{x}\)
equals. Oops! You might recognize this quantity from your calculus studies as the divergent harmonic series, whose sum is infinity. Therefore, as the above definition of expectation suggests, we say in this case that the expected value of \(X\) doesn't exist.
This is the first example where the summation is not absolutely convergent. That is, we cannot get a finite answer here. The expectation for a random variable may not always exist. In this course, we will not encounter nonexistent expectations very often. However, when you encounter more sophisticated distributions in your future studies, you may find that the expectation does not exist.
8.2  Properties of Expectation
8.2  Properties of ExpectationExample 86
Suppose the p.m.f. of the discrete random variable \(X\) is:
x  0  1  2  3 
f(x)  0.2  0.1  0.4  0.3 
What is \(E(2)\)? What is \(E(X)\)? And, what is \(E(2X)\)?
This example leads us to a very helpful theorem.
 If \(c\) is a constant, then \(E(c)=c\)
 If \(c\) is a constant and \(u\) is a function, then:
\(E[cu(X)]=cE[u(X)]\)
Proof
Example 87
Let's return to the same discrete random variable \(X\). That is, suppose the p.m.f. of the random variable \(X\) is:
It can be easily shown that \(E(X^2)=4.4\). What is \(E(2X+3X^2)\)?
This example again leads us to a very helpful theorem.
Let \(c_1\) and \(c_2\) be constants and \(u_1\) and \(u_2\) be functions. Then, when the mathematical expectation \(E\) exists, it satisfies the following property:
\(E[c_1 u_1(X)+c_2 u_2(X)]=c_1E[u_1(X)]+c_2E[u_2(X)]\)
Before we look at the proof, it should be noted that the above property can be extended to more than two terms. That is:
\(E\left[\sum\limits_{i=1}^k c_i u_i(X)\right]=\sum\limits_{i=1}^k c_i E[u_i(X)]\)
Proof
Example 88
Suppose the p.m.f. of the discrete random variable \(X\) is:
In the previous examples, we determined that \(E(X)=1.8\) and \(E(X^2)=4.4\). Knowing that, what is \(E(4X^2)\) and \(E(3X+2X^2)\)?
Using part (b) of the first theorem, we can determine that:
\(E(4X^2)=4E(X^2)=4(4.4)=17.6\)
And using the second theorem, we can determine that:
\(E(3X+2X^2)=3E(X)+2E(X^2)=3(1.8)+2(4.4)=14.2\)
Example 89
Let \(u(X)=(Xc)^2\) where \(c\) is a constant. Suppose \(E[(Xc)^2]\) exists. Find the value of \(c\) that minimizes \(E[(Xc)^2]\).
Note that the expectations \(E(X)\) and \(E[(XE(X))^2]\) are so important that they deserve special attention.
8.3  Mean of X
8.3  Mean of XIn the previous pages, we concerned ourselves with finding the expectation of any general function \(u(X)\) of the discrete random variable \(X\). Here, we'll focus our attention on one particular function, namely:
\(u(X)=X\)
Let's jump right in, and give the expectation in this situation a special name!
 First Moment about the Origin

When the function \(u(X)=X\), the expectation of \(u(X)\), when it exists:
\(E[u(X)]=E(X)=\sum\limits_{x\in S} xf(x) \)
is called the expected value of \(X\), and is denoted \(E(X)\). Or, it is called the mean of \(X\), and is denoted as \(\mu\) (the greek letter mu, read "mew"). That is, \(\mu=E(X)\). The expected value of \(X\) can also be called the first moment about the origin.
Example 810
The maximum patent life for a new drug is 17 years. Subtracting the length of time required by the Food and Drug Administration for testing and approval of the drug provides the actual patent life for the drug — that is, the length of time that the company has to recover research and development costs and to make a profit. The distribution of the lengths of actual patent lives for new drugs is as follows:
Years, y  3  4  5  6  7  8  9  10  11  12  13 

f(y)  0.03  0.05  0.07  0.10  0.14  0.20  0.18  0.12  0.07  0.03  0.01 
What is the mean patent life for a new drug?
Answer The mean can be calculated as:
\(\mu_Y=E(Y)=\sum\limits_{y=3}^{13} yf(y)=3(0.03)+4(0.05)+\cdots+12(0.03)+13(0.01)=7.9\)
That is, the average patent life for a new drug is 7.9 years.
Example 811
Let \(X\) follow a hypergeometric distribution in which n objects are selected from \(N\) objects with \(m\) of the objects being one type, and \(Nm\) of the objects being a second type. What is the mean of \(X\)?
Solution
Recalling the p.m.f. of a hypergeometric distribution and using the definition of the expected value of \(X\), we have:
\(E(X)=\sum\limits_{x\in S} x \dfrac{\dbinom{m}{x} \dbinom{Nm}{nx}}{\dbinom{N}{n}}\)
You should be getting the idea already that this is going to be messy! So, we're going to work on it in parts. First, note that the first term of the summation equals 0 when \(x=0\). And, note that some of the terms can be written differently:
That is:
\(\dbinom{m}{x}=\dfrac{m!}{x!(mx)!}\)
and:
\(\dbinom{N}{n}=\dfrac{N!}{n!(Nn)!}=\dfrac{N(N1)!}{n \cdot (n1)!(Nn)!}=\dfrac{N}{n} \cdot \dfrac{(N1)!}{(n1)!(N1(n1))!}=\dfrac{N}{n} \cdot \dbinom{N1}{n1}\)
Therefore, replacing these quantities in our formula for \(E(X)\), we have:
My voice gets caught off at the end there, but we still managed to finish the proof in the nick of time! We've shown that, in general, the mean of a hypergeometric random variable \(X\), in which \(n\) objects are selected from \(N\) objects with \(m\) of the objects being one type, is:
\(E(X)=\dfrac{mn}{N}\)
Example 812
Suppose the random variable \(X\) follows the uniform distribution on the first \(m\) positive integers. That is, suppose the p.m.f. of \(X\) is:
\(f(x)=\dfrac{1}{m}\) for \(x=1, 2, 3, \ldots, m\)
What is the mean of \(X\)?
8.4  Variance of X
8.4  Variance of XExample 813
Consider two probability mass functions. The first:
x  3  4  5 

f(x)  0.3  0.4  0.3 
And, the second:
y  1  2  6  8 

f(y)  0.4  0.1  0.3  0.2 
It is a straightforward calculation to show that the mean of \(X\) and the mean of \(Y\) are the same:
\(\mu_X=E(X) = 3(0.3)+4(0.4)+5(0.3)=4\)
\(\mu_Y=E(Y)=1(0.4)+2(0.1)+6(0.3)+8(0.2)=4\)
Let's draw a picture that illustrates the two p.m.f.s and their means.
Again, the pictures illustrate (at least) two things:
 The \(X\) and \(Y\) means are at the fulcrums in which their axes don't tilt ("a balanced seesaw").
 The second p.m.f. exhibits greater variability than the first p.m.f.
That second point suggests that the means of \(X\) and \(Y\) are not sufficient in summarizing their probability distributions. Hence, the following definition!
Definition. When \(u(X)=(X\mu)^2\), the expectation of \(u(X)\):
\(E[u(X)]=E[(X\mu)^2]=\sum\limits_{x\in S} (x\mu)^2 f(x)\)
is called the variance of \(X\), and is denoted as \(\text{Var}(X)\) or \(\sigma^2\) ("sigmasquared"). The variance of \(X\) can also be called the second moment of \(X\) about the mean \(\mu\).
The positive square root of the variance is called the standard deviation of \(X\), and is denoted \(\sigma\) ("sigma"). That is:
\(\sigma=\sqrt{Var(X)}=\sqrt{\sigma^2}\)
Although most students understand that \(\mu=E(X)\) is, in some sense, a measure of the middle of the distribution of \(X\), it is much more difficult to get a feeling for the meaning of the variance and the standard deviation. The next example (hopefully) illustrates how the variance and standard deviation quantifies the spread or dispersion of the values in the support \(S\).
Example 814
Let's return to the probability mass functions of the previous example. The first:
x  3  4  5 

f(x)  0.3  0.4  0.3 
And, the second:
y  1  2  6  8 

f(y)  0.4  0.1  0.3  0.2 
What is the variance and standard deviation of \(X\)? How does it compare to the variance and standard deviation of \(Y\)?
Solution
The variance of \(X\) is calculated as:
\(\sigma^2_X=E[(X\mu)^2]=(34)^2(0.3)+(44)^2(0.4)+(54)^2(0.3)=0.6\)
And, therefore, the standard deviation of \(X\) is:
\(\sigma_X=\sqrt{0.6}=0.77\)
Now, the variance of \(Y\) is calculated as:
\(\sigma_Y^2=E[(Y\mu)^2]=(14)^2(0.4)+(24)^2(0.1)+(64)^2(0.3)+(84)^2(0.2)=8.4\)
And, therefore, the standard deviation of \(Y\) is:
\(\sigma_Y=\sqrt{8.4}=2.9\)
As you can see, the expected variation in the random variable \(Y\), as quantified by its variance and standard deviation, is much larger than the expected variation in the random variable \(X\). Given the p.m.f.s of the two random variables, this result should not be surprising.
As you might have noticed, the formula for the variance of a discrete random variable can be quite cumbersome to use. Fortunately, there is a slightly easiertoworkwith alternative formula.
An easier way to calculate the variance of a random variable \(X\) is:
\(\sigma^2=Var(X)=E(X^2)\mu^2\)
Proof
Example 815
Use the alternative formula to verify that the variance of the random variable \(X\) with the following probability mass function:
x  3  4  5 

f(x)  0.3  0.4  0.3 
is 0.6, as we calculated earlier.
Solution
First, we need to calculate the expected value of \(X^2\):
\(E(X^2)=3^2(0.3)+4^2(0.4)+5^2(0.3)=16.6\)
Earlier, we determined that \(\mu\), the mean of \(X\), is 4. Therefore, using the shortcut formula for the variance, we verify that indeed the variance of \(X\) is 0.6:
\(\sigma^2_X=E(X^2)\mu^2=16.64^2=0.6\)
Example 816
Suppose the random variable \(X\) follows the uniform distribution on the first \(m\) positive integers. That is, suppose the p.m.f. of \(X\) is:
\(f(x)=\dfrac{1}{m}\) for \(x=1, 2, 3, \ldots, m\)
What is the variance of \(X\)?
Solution
On the previous page, we determined that the mean of the discrete uniform random variable \(X\) is:
\(\mu=E(X)=\dfrac{m+1}{2}\)
If we can calculate \(E(X^2)\), we can use the shortcut formula to calculate the variance of \(X\). Let's do that:
The following theorem can be useful in calculating the mean and variance of a random variable \(Y\) that is a linear function of a random variable \(X\).
If the mean and variance of the random variable \(X\) is:
\(\mu_X\) and \(\sigma^2_X\)
respectively, then the mean, variance and standard deviation of the random variable \(Y=aX+b\) is:
\begin{array}{lcl} \mu_Y &=& a\mu_X+b\\ \sigma^2_Y &=& a^2 \sigma^2_X\\ \sigma_Y &=& a\sigma_X \end{array}
Proof
Example 817
The mean temperature in Victoria, B.C. is 50 degrees Fahrenheit with standard deviation 8 degrees Fahrenheit. What is the mean temperature in degrees Celsius? What is the standard deviation in degrees Celsius?
Solution
First, recall that the conversion from Fahrenheit (F) to Celsius (C) is:
\(C=\dfrac{5}{9}(F32)\)
Therefore, the mean temperature in degrees Celsius is calculated as:
\(\mu_C=E(C)=E\left[\dfrac{5}{9}F\dfrac{160}{9}\right]= \dfrac{5}{9}E(F)\dfrac{160}{9}=\dfrac{5}{9}(50)\dfrac{160}{9}=\dfrac{250160}{9}=\dfrac{90}{9}=10\)
And, the standard deviation in degrees Celsius is calculated as:
\(\sigma_C=\dfrac{5}{9}\sigma_F=\dfrac{5}{9}(8)=\dfrac{40}{9}=4.44\)
8.5  Sample Means and Variances
8.5  Sample Means and VariancesLet's now spend some time clarifying the distinction between a population mean and a sample mean, and between a population variance and a sample variance.
Situation
Suppose we are interested in determining \(\mu\), the mean number of hours slept nightly by American college students. Because the population of American college students is so large, we can't possibly record the number of hours slept by each American college student.
Let's take a look!
Now, all we need to do is define the sample mean and sample variance!
 Sample Mean

The sample mean, denoted \(\bar{x}\) and read “xbar,” is simply the average of the \(n\) data points \(x_1, x_2, \ldots, x_n\):
\(\bar{x}=\dfrac{x_1+x_2+\cdots+x_n}{n}=\dfrac{1}{n} \sum\limits_{i=1}^n x_i\)
The sample mean summarizes the "location" or "center" of the data.
Example 818
A random sample of 10 American college students reported sleeping 7, 6, 8, 4, 2, 7, 6, 7, 6, 5 hours, respectively. What is the sample mean?
Solution
The sample mean is:
\(\bar{x}=\dfrac{7+6+8+4+2+7+6+7+6+5}{10}=5.8\)
 Sample Variance

The sample variance, denoted \(s^2\) and read "ssquared," summarizes the "spread" or "variation" of the data:
\(s^2=\dfrac{(x_1\bar{x})^2+(x_2\bar{x})^2+\cdots+(x_n\bar{x})^2}{n1}=\dfrac{1}{n1}\sum\limits_{i=1}^n (x_i\bar{x})^2\)
 Sample Standard Deviation

The sample standard deviation, denoted \(s\) is simply the positive square root of the sample variance. That is:
\(s=\sqrt{s^2}\)
Example 819
A random sample of 10 American college students reported sleeping 7, 6, 8, 4, 2, 7, 6, 7, 6, 5 hours, respectively. What is the sample standard deviation?
Solution
The sample variance is:
\(s^2=\dfrac{1}{9}\left[(75.8)^2+(65.8)^2+\cdots+(55.8)^2\right]=\dfrac{1}{9}(27.6)=3.067\)
Therefore, the sample standard deviation is:
\(s=\sqrt{3.067}=1.75\)
An easier way to calculate the sample variance is:
\(s^2=\dfrac{1}{n1}\left[\sum\limits_{i=1}^n x^2_in{\bar{x}}^2\right]\)
Proof
Example 820
A random sample of 10 American college students reported sleeping 7, 6, 8, 4, 2, 7, 6, 7, 6, 5 hours, respectively. What is the sample standard deviation?
Solution
The sample variance is:
\(s^2=\dfrac{1}{9}\left[(7^2+6^2+\cdots+6^2+5^2)10(5.8)^2\right]=3.067\)
Therefore, the sample standard deviation is:
\(s=\sqrt{3.067}=1.75\)
We will get a better feel for what the sample standard deviation tells us later on in our studies. For now, you can roughly think of it as the average distance of the data values \(x_1, x_2, \ldots, x_n\) from their sample mean.