Moments: Mean and Variance

Mean

The expectation (mean or the first moment) of a discrete random variable X is defined to be:

\(E(X)=\sum_{x}xf(x)\)

where the sum is taken over all possible values of X. E(X) is also called the mean of X or the average of X, because it represents the long-run average value if the experiment were repeated infinitely many times.

In the trivial example where X takes the values 1, 2, and 5 with probabilities 1/4, 1/4, and 1/2 respectively, the mean of X is

\(E(X) = 1(.25) + 2(.25) + 5(.5) = 3.25\).

In calculating expectations, it helps to visualize a table with two columns. The first column lists the possible values x of the random variable X, and the second column lists the probabilities f(x) associated with these values:

x	f(x)
1	.25
2	.25
5	.50

To calculate E(X) we merely multiply the two columns together, row by row, and add up the products: 1(.25) + 2(.25) + 5(.5) = 3.25.

If g(X) is a function of X (e.g. g(X) = logX, g(X) = X₂, etc.) then g(X) is also a random variable. Its expectation is

\(E(g(X))=\sum_{x}g(x)f(x)\) (4)

Visually, in the table containing x and f(x), we can simply insert a third column for g(x) and add up the products g(x)f(x). In our example, if Y = g(X) = X₃, the table becomes

x	f(x)	g(x) = x³
1	.25	1³ = 1
2	.25	2³ = 8
5	.50	5³ = 125

and

\(E( Y ) = E(X^3) = 1(.25) + 8(.25) + 125(.5) = 64.75\).

If Y = g(X) = a + bX where a and b are constants, then Y is said to be a linear function of X, and E( Y ) = a + bE(X). An algebraic proof is

\begin{align}
E(Y)&=\sum\limits_y yf(y)\\
&= \sum\limits_x (a+bx)f(x)\\
&= \sum\limits_x af(x)+\sum\limits_x bxf(x)\\
&= a\sum\limits_x f(x)+b\sum\limits_x xf(x)\\
&= a\cdot1+bE(X)\\
\end{align}

That is, if g(X) is linear, then E(g(X)) = g(E(X)). Note, however, that this does not work if the function g is nonlinear. For example, E(X²) is not equal to E(X)², and E(logX) is not equal to logE(X). To calculate E(X²) or E(logX), we need to use expression (4).

Variance

The variance of a discrete random variable, denoted by V (X), is defined to be

\begin{align}
V(X)&= E((X-E(X))^2)\\
&= \sum\limits_x (x-E(X))^2 f(x)\\
\end{align}

That is, V (X) is the average squared distance between X and its mean. Variance is a measure of dispersion, telling us how “spread out” a distribution is. For our simple random variable, the variance is

\(V (X) = (1− 3.25)^2 (.25) + (2 − 3.25)^2 (.25) + (5 − 3.25)^2 (.50) = 3.1875\).

A slightly easier way to calculate the variance is to use the well-known identity

\(V (X) = E(X^2) − (E(X) )^2\).

Visually, this method requires a table with three columns: x, f(x), and x².

x	f(x)	x²
1	.25	1² = 1
2	.25	2² = 4
5	.50	5² = 25

First we calculate

E(X) = 1(.25) + 2(.25) + 5(.50) = 3.25 and
E(X²) = 1(.25) + 4(.25) + 25(.50) = 13.75. Then
V (X) = 13.75 − (3.25)2 = 3.1875.

It can be shown that if a and b are constants, then

\(V (a + bX) = b^2V (X)\).

In other words, adding a constant a to a random variable does not change its variance, and multiplying a random variable by a constant b causes the variance to be multiplied by b².

Another common measure of dispersion is the standard deviation, which is merely the positive square root of the variance,

\(SD(X) = \sqrt{V(X)}\)

Mean and Variance of a Sum of Random Variables

Expectation is always additive; that is, if X and Y are any random variables, then

\(E(X + Y ) = E(X) + E( Y )\).

If X and Y are independent random variables, then their variances will also add:

\(V (X + Y) = V (X) + V ( Y )\) if X, Y independent.

More generally, if X and Y are any random variables, then

\(V (X + Y) = V (X) + V ( Y ) + 2Cov(X, Y )\)

where Cov(X, Y ) is the covariance between X and Y,

\(Cov(X, Y ) = E( (X− E(X)) ( Y − E( Y )) )\).

If X and Y are independent (or merely uncorrelated) then Cov(X, Y ) = 0. This additive rule for variances extends to three or more random variables; e.g.,

\(V (X + Y + Z) = V (X) + V ( Y ) + V (Z) +2Cov(X, Y ) + 2Cov(X, Z) + 2Cov(Y, Z)\)

with all covariances equal to zero if X, Y , and Z are mutually uncorrelated.