Random Variables

A random variable is the outcome of an experiment (i.e. a random process) expressed as a number. We use capital letters near the end of the alphabet (X, Y , Z, etc.) to denote random variables. Random variables are of two types: discrete and continuous.

Continuous random variables are described by probability density functions (PDF). For example, a normally distributed random variable has a bell-shaped density function like this:


The probability that X falls between any two particular numbers, say a and b, is given by the area under the density curve f(x) between a and b,

\( P (a \le X \le b) = \int_{a}^{b}f(x)dx\).

The two continuous random variables that we will use most will either have the Normal distributions or the χ2 (chi-squared) distributions. Areas under the normal and χ2 density functions for calculations of p-values are tabulated and widely available in textbooks. They can also be computed with statistical computer packages.

The Chi-Squared Distribution

The "degrees-of-freedom" (df), completly specify a chi-squared distribution. Here are the properties of a chi-squared random variable:

  • A χ2 random variable takes values between 0 and ∞
  • The mean of a chi-squared distribution equals to its df
  • The variance of a chi-squared distribution equals to 2df, and the standard deviation is \(\sqrt{2df}\).
  • The shape of the distribution is skewed to the right.
  • As df increase, the mean gets larger and the distribution spreads more.
  • As df increase, the distribution becomes more bell-shaped, like a normal, e.g., df → ∞, χ2df → Normal.

Here is a plot of different chi-squared distributions.

The plot was created in R using the following code; in general I find using R much simpler for creating plots like these. You can view a related image in Section 2.3.1 (Agresti (2007)), and check out the Wiki page on the chi-square distribution.

Discrete random variables are described by probability mass functions (PMF), which we will also call “distributions.” For a random variable X, we will write the distribution as f(x) and define it to be:

\( f(x) = P(X = x)\).

In other words, f(x) is the probability that the random variable X takes the specific value x. For example, suppose that X takes the values 1, 2, and 5 with probabilities 1/4, 1/4, and 1/2 respectively. Then we would say that f(1) = 1/4, f(2) = 1/4, f(5) = 1/2, and f(x) = 0 for any x other than 1, 2, or 5:

\(f(x)= \begin{cases}
.25 & x=1, 2 \\
.50 & x=5 \\
0 & \text{otherwise}

A graph of f(x) has spikes at the possible values of X, with the height of a spike indicating the probability associated with that particular value:


Note that Σx f(x) = 1 if the sum is taken over all values of x having nonzero probability. In other words, the sum of the heights of all the spikes must equal one.

Joint Distributions

Suppose that X1, X2, . . . , Xn are n random variables, and let X be the entire vector

X = (X1, X2, . . . , Xn).

Let x = (x1, x2, . . . , xn) denote a particular value that X can take. The joint distribution of X is

f(x) = P(X = x) = P(X1 = x1, X2 = x2, . . . , Xn = xn).

In particular, suppose that the random variables X1, X2, . . . , Xn are independent and identically distributed (iid). Then X1 = x1, X2 = x2, . . . , Xn = xn are independent events, and the joint distribution is

f(x) &= P(X_1=x_1, X_2=x_2, \ldots, X_n=x_n)\\
&= P(X_1=x_1) P(X_2=x_2) \ldots P(X_n=x_n)\\
&= f(x_1)f(x_2)\ldots f(x_n)\\
&= \prod\limits^n_{i=1}f(x_i)\\

where f(xi) refers to the distribution of Xi.