Mean
The expectation (mean or the first moment) of a discrete random variable X is defined to be:
\(E(X)=\sum_{x}xf(x)\)
where the sum is taken over all possible values of X. E(X) is also called the mean of X or the average of X, because it represents the longrun average value if the experiment were repeated infinitely many times.
In the trivial example where X takes the values 1, 2, and 5 with probabilities 1/4, 1/4, and 1/2 respectively, the mean of X is
\(E(X) = 1(.25) + 2(.25) + 5(.5) = 3.25\).
In calculating expectations, it helps to visualize a table with two columns. The first column lists the possible values x of the random variable X, and the second column lists the probabilities f(x) associated with these values:
x

f(x)

1

.25

2

.25

5

.50

To calculate E(X) we merely multiply the two columns together, row by row, and add up the products: 1(.25) + 2(.25) + 5(.5) = 3.25.
If g(X) is a function of X (e.g. g(X) = logX, g(X) = X_{2}, etc.) then g(X) is also a random variable. Its expectation is
\(E(g(X))=\sum_{x}g(x)f(x)\) (4)
Visually, in the table containing x and f(x), we can simply insert a third column for g(x) and add up the products g(x)f(x). In our example, if Y = g(X) = X_{3}, the table becomes
x

f(x)

g(x) = x^{3}

1

.25

1^{3} = 1

2

.25

2^{3} = 8

5

.50

5^{3} = 125

and
\(E( Y ) = E(X^3) = 1(.25) + 8(.25) + 125(.5) = 64.75\).
If Y = g(X) = a + bX where a and b are constants, then Y is said to be a linear function of X, and E( Y ) = a + bE(X). An algebraic proof is
\begin{align}
E(Y)&=\sum\limits_y yf(y)\\
&= \sum\limits_x (a+bx)f(x)\\
&= \sum\limits_x af(x)+\sum\limits_x bxf(x)\\
&= a\sum\limits_x f(x)+b\sum\limits_x xf(x)\\
&= a\cdot1+bE(X)\\
\end{align}
That is, if g(X) is linear, then E(g(X)) = g(E(X)). Note, however, that this does not work if the function g is nonlinear. For example, E(X^{2}) is not equal to E(X)^{2}, and E(logX) is not equal to logE(X). To calculate E(X^{2}) or E(logX), we need to use expression (4).
Variance
The variance of a discrete random variable, denoted by V (X), is defined to be
\begin{align}
V(X)&= E((XE(X))^2)\\
&= \sum\limits_x (xE(X))^2 f(x)\\
\end{align}
That is, V (X) is the average squared distance between X and its mean. Variance is a measure of dispersion, telling us how “spread out” a distribution is. For our simple random variable, the variance is
\(V (X) = (1− 3.25)^2 (.25) + (2 − 3.25)^2 (.25) + (5 − 3.25)^2 (.50) = 3.1875\).
A slightly easier way to calculate the variance is to use the wellknown identity
\(V (X) = E(X^2) − (E(X) )^2\).
Visually, this method requires a table with three columns: x, f(x), and x^{2}.
x

f(x)

x^{2}

1

.25

1^{2} = 1

2

.25

2^{2} = 4

5

.50

5^{2} = 25

First we calculate
E(X) = 1(.25) + 2(.25) + 5(.50) = 3.25 and
E(X^{2}) = 1(.25) + 4(.25) + 25(.50) = 13.75. Then
V (X) = 13.75 − (3.25)2 = 3.1875.
It can be shown that if a and b are constants, then
\(V (a + bX) = b^2V (X)\).
In other words, adding a constant a to a random variable does not change its variance, and multiplying a random variable by a constant b causes the variance to be multiplied by b^{2}.
Another common measure of dispersion is the standard deviation, which is merely the positive square root of the variance,
\(SD(X) = \sqrt{V(X)}\)
Mean and Variance of a Sum of Random Variables
Expectation is always additive; that is, if X and Y are any random variables, then
\(E(X + Y ) = E(X) + E( Y )\).
If X and Y are independent random variables, then their variances will also add:
\(V (X + Y) = V (X) + V ( Y )\) if X, Y independent.
More generally, if X and Y are any random variables, then
\(V (X + Y) = V (X) + V ( Y ) + 2Cov(X, Y )\)
where Cov(X, Y ) is the covariance between X and Y,
\(Cov(X, Y ) = E( (X− E(X)) ( Y − E( Y )) )\).
If X and Y are independent (or merely uncorrelated) then Cov(X, Y ) = 0. This additive rule for variances extends to three or more random variables; e.g.,
\(V (X + Y + Z) = V (X) + V ( Y ) + V (Z) +2Cov(X, Y ) + 2Cov(X, Z) + 2Cov(Y, Z)\)
with all covariances equal to zero if X, Y , and Z are mutually uncorrelated.