Empirical Distribution Functions

Before we can work on developing a hypothesis test for testing whether an empirical distribution function F_n(x) fits a hypothesized distribution function F(x) we better have a good idea of just what is an empirical distribution function F_n(x). Therefore, let's start with formally defining it.

Definition. Given an observed random sample \(X_1 , X_2 , \dots , X_n\), an empirical distribution function F_n(x) is the fraction of sample observations less than or equal to the value x. More specifically, if y₁ < y₂ < ... < y_n are the order statistics of the observed random sample, with no two observations being equal, then the empirical distribution function is defined as:

eqn

That is, for the case in which no two observations are equal, the empirical distribution function is a "step" function that jumps 1/n in height at each observation x_k. For the cases in which two (or more) observations are equal, that is, when there are n_k observations at x_k, the empirical distribution function is a "step" function that jumps n_k/n in height at each observation x_k.

Such a formal definition is all well and good, but it would probably make even more sense if we took at a look at a simple example.

Example

A random sample of n = 8 people yields the following (ordered) counts of the number of times they swam in the past month:

0 1 2 2 4 6 6 7

Calculate the empirical distribution function F_n(x).

As reported, the data are ordered, therefore the order statistics are y₁ = 0, y₂ = 1, y₃ = 2, y₄ = 2, y₅ = 4, y₆ = 6, y₇ = 6, and y₈ = 7. Therefore, using the definition of the empirical distribution function, we have:

and:

and eqn

Now, noting that there are two 2s, we need to jump 2/8 at x = 2. We've already accumulated a probability of 2/8 so far. Therefore, we need to add 2/8 to it:

Then, noting that there is only one 4, we need to jump 1/8 at x = 4. That is, adding 1/8 to the 4/8 that we've already accumulated, we get:

Again, noting that there are two 6s, we need to jump 2/8 at x = 6. We've accumulated a probability of 5/8 so far. Adding 2/8 to it, we get:

And, finally:

Plotting the function, it should look something like this then:

drawing