Before we can work on developing a hypothesis test for testing whether an empirical distribution function Fn(x) fits a hypothesized distribution function F(x) we better have a good idea of just what is an empirical distribution function Fn(x). Therefore, let's start with formally defining it.
Definition. Given an observed random sample x1, x2, ..., xn, an empirical distribution function Fn(x) is the fraction of sample observations less than or equal to the value x. More specifically, if y1 < y2 < ... < yn are the order statistics of the observed random sample, with no two observations being equal, then the empirical distribution function is defined as: That is, for the case in which no two observations are equal, the empirical distribution function is a "step" function that jumps 1/n in height at each observation xk. For the cases in which two (or more) observations are equal, that is, when there are nk observations at xk, the empirical distribution function is a "step" function that jumps nk/n in height at each observation xk. |
Such a formal definition is all well and good, but it would probably make even more sense if we took at a look at a simple example.
Example
A random sample of n = 8 people yields the following (ordered) counts of the number of times they swam in the past month:
0 1 2 2 4 6 6 7
Calculate the empirical distribution function Fn(x).
As reported, the data are ordered, therefore the order statistics are y1 = 0, y2 = 1, y3 = 2, y4 = 2, y5 = 4, y6 = 6, y7 = 6, and y8 = 7. Therefore, using the definition of the empirical distribution function, we have:
and:
and
Now, noting that there are two 2s, we need to jump 2/8 at x = 2. We've already accumulated a probability of 2/8 so far. Therefore, we need to add 2/8 to it:
Then, noting that there is only one 4, we need to jump 1/8 at x = 4. That is, adding 1/8 to the 4/8 that we've already accumulated, we get:
Again, noting that there are two 6s, we need to jump 2/8 at x = 6. We've accumulated a probability of 5/8 so far. Adding 2/8 to it, we get:
And, finally:
Plotting the function, it should look something like this then: