A random variable is the outcome of an experiment (i.e. a random process) expressed as a number. We use capital letters near the end of the alphabet (X, Y , Z, etc.) to denote random variables. Random variables are of two types: discrete and continuous.
Continuous random variables are described by probability density functions (PDF). For example, a normally distributed random variable has a bell-shaped density function like this:
The probability that X falls between any two particular numbers, say a and b, is given by the area under the density curve f(x) between a and b,
\( P (a \le X \le b) = \int_{a}^{b}f(x)dx\).
The two continuous random variables that we will use most will either have the Normal distributions or the χ2 (chi-squared) distributions. Areas under the normal and χ2 density functions for calculations of p-values are tabulated and widely available in textbooks. They can also be computed with statistical computer packages.
The Chi-Squared Distribution
The "degrees-of-freedom" (df), completly specify a chi-squared distribution. Here are the properties of a chi-squared random variable:
- A χ2 random variable takes values between 0 and ∞
- The mean of a chi-squared distribution equals to its df
- The variance of a chi-squared distribution equals to 2df, and the standard deviation is \(\sqrt{2df}\).
- The shape of the distribution is skewed to the right.
- As df increase, the mean gets larger and the distribution spreads more.
- As df increase, the distribution becomes more bell-shaped, like a normal, e.g., df → ∞, χ2df → Normal.
Here is a plot of different chi-squared distributions.
The plot was created in R using the following code; in general I find using R much simpler for creating plots like these. You can view a related image in Section 2.3.1 (Agresti (2007)), and check out the Wiki page on the chi-square distribution.
Discrete random variables are described by probability mass functions (PMF), which we will also call “distributions.” For a random variable X, we will write the distribution as f(x) and define it to be:
\( f(x) = P(X = x)\).
In other words, f(x) is the probability that the random variable X takes the specific value x. For example, suppose that X takes the values 1, 2, and 5 with probabilities 1/4, 1/4, and 1/2 respectively. Then we would say that f(1) = 1/4, f(2) = 1/4, f(5) = 1/2, and f(x) = 0 for any x other than 1, 2, or 5:
\(f(x)= \begin{cases}
.25 & x=1, 2 \\
.50 & x=5 \\
0 & \text{otherwise}
\end{cases}\)
A graph of f(x) has spikes at the possible values of X, with the height of a spike indicating the probability associated with that particular value:
Note that Σx f(x) = 1 if the sum is taken over all values of x having nonzero probability. In other words, the sum of the heights of all the spikes must equal one.
Joint Distributions
Suppose that X1, X2, . . . , Xn are n random variables, and let X be the entire vector
X = (X1, X2, . . . , Xn).
Let x = (x1, x2, . . . , xn) denote a particular value that X can take. The joint distribution of X is
f(x) = P(X = x) = P(X1 = x1, X2 = x2, . . . , Xn = xn).
In particular, suppose that the random variables X1, X2, . . . , Xn are independent and identically distributed (iid). Then X1 = x1, X2 = x2, . . . , Xn = xn are independent events, and the joint distribution is
\begin{align}
f(x) &= P(X_1=x_1, X_2=x_2, \ldots, X_n=x_n)\\
&= P(X_1=x_1) P(X_2=x_2) \ldots P(X_n=x_n)\\
&= f(x_1)f(x_2)\ldots f(x_n)\\
&= \prod\limits^n_{i=1}f(x_i)\\
\end{align}
where f(xi) refers to the distribution of Xi.