Lesson 21: Run Test and Test for Randomness

Lesson 21: Run Test and Test for Randomness

Overview

a person running on a trail

In this lesson, we'll learn how to use what is called the Run Test to test whether the distribution functions F(x) and G(y) of two continuous random variables X and Y, respectively, are equal. That is, we'll use the Run Test to test the null hypothesis:

\(H_0 : F(z)=G(z)\)

against any of the typical alternatives. Note that other than the requirement that the random variables be continuous, no other conditions about the distributions must be met in order for the Run Test to be an appropriate test.

We'll close the lesson with seeing one particular application of the Run Test, namely, that of testing whether a series of observations are random (as opposed to showing some trend or some cycling.)


21.1 - The Run Test

21.1 - The Run Test

Let's start from the beginning... it seems reasonable to think that before we can derive a Run Test, we better know what actually constitutes a run.

What is a run?

Let's suppose we have \(n_1\) observations of the random variable X, and \(n_2\) observations of the random variable Y. Suppose we combine the two sets of independent observations into one larger collection of \(n_1 + n_2\) observations, and then arrange the observations in increasing order of magnitude. If we label from which set each of the ordered observations originally came, we might observe something like this:

example of 6 runs

where x denotes an observation of the random variable X and y denotes an observation of the random variable Y. (We might observe this, for example, if the X values were 0.1, 0.4, 0.5, 0.6, 0.8, and 0.9, and the Y values were 0.2, 0.3, 0.7, 1.0, 1.1, and 1.2). That is, in this case, the smallest of all of the observations is an X value, the second smallest of all of the observations is a Y value, the third smallest of all of the observations is a Y value, the fourth smallest of all of the observations is an X value, and so on. Now, each group of successive values of X and Y is what we call a run. So, in this example, we have six runs. If we instead observed this ordered arrangement:

three runs

we would have three runs. And, if we instead observed this ordered arrangement:

eight runs

we would have eight runs.

Why runs?

The next obvious question is in what way might knowing the number of runs be helpful in testing the null hypothesis of the equality of the distributions F(x) and G(y). Let's investigate that question by taking a look at a few examples. Let's start with the case in which the distributions are equal. In that case, we might observe something like this:

equal distributions

In this particular example, there are eight runs. As you can see, this kind of a picture suggests that when the distributions are equal, the number of runs will likely be large.

Now, let's take a look at one way in which the distribution functions could be unequal. One possibility is that one of the distribution functions is at least as great as the other distribution function at all points z. This situation might look something like this:

one cdf greater than other cdf

In this particular example, there are only two runs. This kind of a situation suggests that when one of the distribution functions is at least as great as the other distribution function, the number of runs will likely be small. Note that this is what the distribution functions might look like if the median of Y was greater than the median of X.

Here's another way in which the distribution functions could be unequal:

drawing

In this case, the medians of X and Y are nearly equal, but the variance of Y is much greater than the variance of X. In this particular example, there are only three runs. Again, in general, when we have this type of situation, we would expect the number of runs to be small.

The above three examples give a pretty clear indication that we're onto something with the idea of using runs to test the null hypothesis of the equality of two distribution functions. If the number of runs is smaller than expected, it seems we should reject the null hypothesis, because a small number of runs suggests that there are differences in either the location or the spread of the two distributions. Now, as is usually the case when conducting a hypothesis test, in order to quantify "smaller than expected," we're going to need to know something about the distribution of R, the number of runs.

What is the p.m.f. of R?

Let's let the random variable R denote the number of runs in the combined ordered sample containing \(n_1\) observations of X and \(n_2\) observations of Y. With possible values, such as 2, 3, and so on, R is clearly discrete. Now, if the null hypothesis is true, that is, the distribution functions are equal, all of the possible permutations of the X's and the Y's in the combined sample are equally likely. Therefore, we can use the classical approach to assigning the probability that R equals a particular value r. That is, to find the distribution of R, we can find:

eqn

for all of the possible values in the support of R. (Note that the support depends on the number of observations in the combined sample. We do know, however, that R must be at least 2.) As is usually the case, the denominator is the easy part. If we have \(n_1 + n_2\) positions:

n1 + n2 positions

in which we can choose to place the \(n_1\) observed x values, then the total number of ways of arranging the x's and y's is:

\(\binom{n_1+n_2}{n_1}\)

Note that once the x values are placed in their positions, the remaining \(n_2\) positions must contain the y values. We just as easily could place the y values first, and then the remaining \(n_1\) positions must contain the x values.

Now for the numerator. That's the messy part! Before we even begin to attempt to generalize a formula for the quantity in the numerator, it might be best to take a look at a concrete example.

Example 21-1

concrete

Suppose we have \(n_1 = 3\) observations of the random variable X and \(n_2 = 3\) observations of the random variable Y. What is the p.m.f. of R, the number of runs?

Answer

Put your seatbelt on. If not your seatbelt, then at least your bookkeeping hat. This is going to get a bit messy at times. So... let's start with the easy part first... the denominator! If we combine the two samples, we have a total of 3+3 = 6 observations to arrange. Therefore, there are:

\(\binom{3+3}{3}=\binom{6}{3}=\dfrac{6!}{3! 3!}=20\)

possible ways of arranging the x's and y's. Now, that's a small enough number that we can actually enumerate all 20 of the possible arrangements. Here they are:

table of arrangements

Now, as the above table suggests, of the 20 possible arrangements, there are two ways of getting R = 2 runs. Therefore:

\(P(R=2)=\frac{2}{20}\)

And, of the 20 possible arrangements, there are four ways of getting R = 3 runs. Therefore:

\(P(R=3)=\frac{4}{20}\)

Piece of cake! What was all this talk about messy bookkeeping? With the snap of a finger, we can determine the entire p.m.f. of R:

table

Note that the probabilities add to 1. That's a good sign! But finding the specific p.m.f. of R was not the point of this example. We were going to use this example to try to learn something about finding the p.m.f. of R in general. The only reason why we were able to find this p.m.f. of R with such ease was because we could enumerate all 20 of the possible outcomes. What if we can't do that? That is, what if there were so many possible arrangements that it would be crazy to try to enumerate them all? Well, we would go way back to what we did in Stat 414... we would derive a general counting formula! That's what we're working towards here.

Let's take a look at a case in which r is even, \(r = 4\), say. If you think about it, the only way we can get four runs is if the \(n_1 = 3\) values of X are broken up into 2 runs, which can be done in either of two ways:

x runs

and the \(n_2 = 3\) values of Y are broken up into 2 runs, which can be done in either of two ways:

y runs

Now, it's just a matter of putting the 2 runs of X and the 2 runs of Y together to make a total of 4 runs in the sequence. Well, there are two ways of getting 2 runs of X and two ways of getting 2 runs of Y. Therefore, the multiplication rule tells us that we should expect there to be 2×2 = 4 ways of getting four runs when the samples are combined. There's just one problem with that calculation, namely that there are 2 ways of starting the sequence off. That is, we could start with an X run:

runs starting with an X

or we could start with a Y run:

runs starting with y

So, the multiplication rule actually tells us that there are \(2 \time 2 \times 2 = 8\) ways of getting four runs when we have \(n_1 = 3\) values of X and \(n_2 = 3\) values of Y. And, we've just enumerated all eight of them!

Now, let's try to generalize the above process. We started by considering a case in which r was even, specifically, r = 4. Note that r is even implies that \(r = 2k\), for a positive integer k. (If \(r = 4\), for example, then \(k = 2\).) Now, the only way we can get \(r = 4\) runs is if the \(n_1 = 3\) values of X are broken up into \(k = 2\) runs and the \(n_2 = 3\) values of Y are broken up into k = 2 runs. We can form the k = 2 runs of the \(n_1 = 3\) values of X by inserting \(k−1 = 1\) divider into the \(n_1 − 1 = 2\) spaces between the X values:

2 spaces

When the divider is here:

drawing

we create these two runs:

two runs

And, when the divider is here:

divider

we create these two runs:

two runs

Note that, in general, there are:

\(\binom{n_1-1}{k-1}\)

ways of inserting k−1 dividers into the \(n_1 − 1\) spaces between the X values, with no more than one divider per space.

Now, we can go through the exact same process for the Y values. It shouldn't be too hard to see that, in general, there are:

\(\binom{n_2-1}{k-1}\)

ways of inserting k−1 dividers into the \(n_2 − 1\) spaces between the Y values, with no more than one divider per space.

Now, if you go back and look, once we broke up the X values into k = 2 runs, and the Y values into k = 2 runs, the next thing we did was to put the two sets of two runs of X and two sets of two runs of Y together. The multiplication rule tells us that there are:

\(\binom{n_1-1}{k-1}\binom{n_2-1}{k-1}\)

ways of putting the two sets of runs together to form \(r = 2k\) runs beginning with a run of x's. And, the multiplication rule tells us that there are:

\(\binom{n_2-1}{k-1}\binom{n_1-1}{k-1}\)

ways of putting the two sets of runs together to form \(r = 2k\) runs beginning with a run of y's. Therefore, the total number of ways of getting \(r = 2k\) runs is:

\(\binom{n_1-1}{k-1}\binom{n_2-1}{k-1}+\binom{n_2-1}{k-1}\binom{n_1-1}{k-1}=2\binom{n_1-1}{k-1}\binom{n_2-1}{k-1}\)

And, therefore putting the numerator and the denominator together, we get:

\(P(R=2k)=\frac{2\binom{n_1-1}{k-1}\binom{n_2-1}{k-1}}{\binom{n_1+n_2}{n_1}}\)

when k is a positive integer.

Whew! Now, we've found the probability of getting r runs when r is even. What happens if r is odd? That is, what happens if \(r = 2k+1\) for a positive integer k? Well, let's consider the case in which \(r = 3\), and therefore \(k = 1\). If you think about it, there are two ways we can get three runs... either we need \(k+1 = 2\) runs of x's and k = 1 run of y's, such as:

x runs and run of ys

or we need \(k = 1\) run of x's and \(k+1 = 2\) runs of y's:

run of xs and y runs

Perhaps you can see where this is going? In general, we can form \(k+1\) runs of the \(n_1\) values of X by inserting k dividers into the \(n_1−1\) spaces between the X values, with no more than one divider per space, in:

\(\binom{n_1-1}{k}\)

ways. Similarly, we can form k runs of the \(n_2\) values of Y in:

\(\binom{n_2-1}{k-1}\)

ways. The two sets of runs can be placed together to form \(2k+1\) runs in:

\(\binom{n_1-1}{k}\binom{n_2-1}{k-1}\)

ways. Likewise, \(k+1\) runs of the \(n_2\) values of Y, and k runs of the \(n_1\) values of X can be placed together to form:

\(\binom{n_2-1}{k}\binom{n_1-1}{k-1}\)

sets of \(2k+1\) runs. Therefore, the total number of ways of getting \(r = 2k+1\) runs is:

\(\binom{n_1-1}{k}\binom{n_2-1}{k-1}+\binom{n_2-1}{k}\binom{n_1-1}{k-1}\)

And, therefore putting the numerator and the denominator together, we get:

\(P(R=2k+1)=\frac{\binom{n_1-1}{k}\binom{n_2-1}{k-1}+\binom{n_2-1}{k}\binom{n_1-1}{k-1}}{\binom{n_1+n_2}{n_1}} \)

Yikes! Let's put this example to rest!

Let's summarize what we've learned.

Summary. The probability that R, the number of runs, takes on a particular value r is:

\(P(R=2k)=\dfrac{2\binom{n_1-1}{k-1}\binom{n_2-1}{k-1}}{\binom{n_1+n_2}{n_1}}\)

if r is even, that is, if \(r = 2k\) for a positive integer k. The probability that R, the number of runs, takes on a particular value r is:

\(P(R=2k+1)=\dfrac{\binom{n_1-1}{k}\binom{n_2-1}{k-1}+\binom{n_2-1}{k}\binom{n_1-1}{k-1}}{\binom{n_1+n_2}{n_1}} \)

if r is odd, that is, if\(r = 2k + 1\) for a positive integer k.

Now, let's wrap up our development of the Run Test with an example.

Example 21-2

blank tv with room reflecting on it

Let X and Y denote the times in hours per weeks that students in two different schools watch television. Let F(x) and G(y) denote the respective distributions. To test the null hypothesis:

\(H_0 : F(z) =G(z)\)

a random sample of eight students was selected from each school, yielding the following results:

data

What conclusion should we make about the equality of the two distribution functions?

Answer

The combined ordered sample, with the x values in blue and the y values in red, looks like this:

data

Counting, we see that there are 9 runs. We should reject the null hypothesis if the number of runs is smaller than expected. Therefore, the critical region should be of the form rc. In order to determine what we should set the value of c to be, we'll need to know something about the p.m.f. of R. We can use the formulas that we derived above to determine various probabilities. Well, with \(n_1 = 8 , n_2 = 8 , \text{ and } k = 1\), we have:

\(P(R=2)=\dfrac{2\binom{7}{0}\binom{7}{0}}{\binom{16}{8}}=\dfrac{2}{12,870}=0.00016\)

and:

\(P(R=3)=\dfrac{\binom{7}{1}\binom{7}{0}+\binom{7}{1}\binom{7}{0}}{\binom{16}{8}}=\dfrac{14}{12,870}=0.00109\)

(Note that Table I in the back of the textbook can be helpful in evaluating the value of the "binomial coefficients.") Now, with \(n_1 = 8 , n_2 = 8 , \text{ and } k = 2\), we have:

\(P(R=4)=\dfrac{2\binom{7}{1}\binom{7}{1}}{\binom{16}{8}}=\dfrac{98}{12,870}=0.00761\)

and:

\(P(R=5)=\dfrac{\binom{7}{2}\binom{7}{1}+\binom{7}{2}\binom{7}{1}}{\binom{16}{8}}=\dfrac{294}{12,870}=0.02284\)

And, with \(n_1 = 8 , n_2 = 8 , \text{ and } k = 3\), we have:

\(P(R=6)=\dfrac{2\binom{7}{2}\binom{7}{2}}{\binom{16}{8}}=\dfrac{882}{12,870}=0.06853\)

Let's stop to see what we have going for us so far. Well, so far we've learned that:

\(P(R \le 6)=0.00016+0.00109+0.00761+0.02284+0.06853=0.1002\)

Hmmm. That tells us that if we set c = 6, we'd have a 0.1002 probability of committing a Type I error. That seems reasonable! That is, let's decide to reject the null hypothesis of the equality of the two distribution functions if the number of observed runs \(r ≤ 6\). It's not... we observed 9 runs. Therefore, we fail to reject the null hypothesis at the 0.10 level. There is insufficient evidence at the 0.10 level to conclude that the distribution functions are not equal.

A Large-Sample Test

As our work in the previous example illustrates, conducting a single run test can be quite extensive in the calculation front. Is there an easier way? Fortunately, yes... that is, providing \(n_1\) and \(n_2\) are large. Typically, we consider the samples to be large if \(n_1\) is at least 10 and \(n_2\) is at least 10. If the samples are large, then the distribution of R can be approximated with a normally distributed random variable. That is, it can be shown that:

\(Z=\dfrac{R-\mu_R}{\sqrt{Var(R)}}\)

follows an approximate standard normal \(N(0, 1)\) distribution with mean:

\(\mu_R=E(R)=\dfrac{2n_1n_2}{n_1+n_2}+1 \)

and variance:

\( Var(R)=\dfrac{2n_1n_2(2n_1n_2-n_1-n_2)}{(n_1+n_2)^2(n_1+n_2-1)}\).

Because a small number of runs is evidence that the distribution functions are unequal, the critical region for testing the null hypothesis:

\(H_0 : F(z) = G(z)\)

is of the form \(z ≤ −z_{\alpha}\), where \(\alpha\) is the desired significance level. Let's take a look at an example.

Example 21-3

buses

A charter bus line has 48-passenger buses and 38-passenger buses. With X and Y denoting the number of miles traveled per day for the 48-passenger and 38-passenger buses, respectively, the bus company is interested in testing the equality of the two distributions:

\(H_0 : F(z) = G(z)\)

The company observed the following data on a random sample of \(n_1 = 10\) buses holding 48 passengers and \(n_2 = 11\) buses holding 38 passengers:

data

Using the normal approximation to R, conduct the hypothesis test at the 0.05 level.

Answer

The combined ordered sample, with the x values in blue and the y values in red, looks like this:

data

Counting, we see that there are 9 runs. Using the normal approximation to R, with \(n_1 = 10\) and \(n_2 = 11\), we have:

\(\mu_R=\dfrac{2n_1n_2}{n_1+n_2}+1=\dfrac{2(10)(11)}{10+11}+1=11.476\)

and:

\( Var(R)=\dfrac{2n_1n_2(2n_1n_2-n_1-n_2)}{(n_1+n_2)^2(n_1+n_2-1)} =\dfrac{2(10)(11)[2(10)(11)-10-11]}{(10+11)^2(10+11-1)}=4.9637\)

Now, we observed r = 9 runs. Therefore, the approximate P-value, using a half-unit correction for continuity, is:

\(P \approx P\left[Z \le \dfrac{9.5-11.476}{\sqrt{4.9637}} \right]=P[Z \le -0.89]=0.187\)

We fail to reject the null hypothesis at the 0.05 level, because the P-value is greater than 0.05. There is insufficient evidence at the 0.05 level to conclude that the two distribution functions are unequal.


21.2 - Test for Randomness

21.2 - Test for Randomness

A common application of the run test is a test for randomness of observations. Because an interest in randomness of observations is quite often seen in a quality-control setting, that's the application that will be the focus of our attention. For example, suppose a quality control supervisor at a paint manufacturing company suspects that the weights of paint cans on the production line are not varying in a random way, as she would expect. Instead, she suspects that the weights are either trending upward or are varying in a cyclic fashion. Every ten minutes, she randomly samples a paint can from the production line, and determines the weights of the randomly selected can.

Let's let \(x_1, x_2, \dots , x_N\) denote the observed weights, where the subscripts designate the order in which the outcomes were obtained. (Note that the textbook uses the letter k here to denote the number of observations in the sample, an unfortunate choice, I think, because we've already taken k to mean something else in this lesson. At any rate, just to be different, and hopefully clearer, I'll use a capital letter N to denote the total number of observations in the sample.) For the sake of concreteness, let's suppose that N is even. If we calculate the median m of the observed weights, then by definition, the median divides the observed sample in half. That is, half of the weights will be less than the median, and half of the weights will be greater than the median. Now, suppose we replace each observation by L if it falls below the median, and by U if it falls above the median.

  1. If we observe something like this:

    U U U U L U L L L L

    then the production process is showing a trend. In this case, we would not observe as many runs r as we would expect if the process were truly random. In this case, we would reject the null hypothesis of randomness, in favor of the alternative hypothesis of a trend effect, if \(r ≤ c\).

  2. If we instead observed something like this:

    U L U L U L U L U L

    then the production process is showing some kind of cyclic event. In this case, we would observe more runs r than we would expect if the process were truly random. In this case, we would reject the null hypothesis of randomness, in favor of the alternative hypothesis of a cyclic effect, if \(r ≥ c\).

  3. If we aren't sure in which way a process would deviate from randomness, then we should allow for the possibility of either a trend effect or a cyclic effect. In this case, we should reject the null hypothesis of randomness, in favor of the alternative hypothesis of either a trend effect or a cyclic effect, if \(r ≤ c\) or \(r ≥ c\).

Let's try this idea out on an example!

Example 21-4

conveyor belt

A quality control chart has been maintained for the weights of paint cans taken from a conveyor belt at a fixed point in a production line. Sixteen (16) weights obtained today, in order of time, are as follows:

data

Use the run test, at approximately a 0.01 level, to determine whether the weights of the paint cans on the conveyor belt deviate from randomness.

Answer

With just a bit of ordering work (or using Minitab or some other statistical software), we can readily determine that the median weight of the sampled paint cans is 67.9. Labeling each observed weight with either a U if the observed weight is greater than the median (67.9), or an L if the observed weight is less than the median (67.9), we get:

drawing

Counting, we see that we observe 7 runs. Because there is no specific direction given in the possible deviation from randomness, we should test for both trend and cyclic effects. That is, we should reject the null hypothesis if we have either too few or too many runs. That is, our rejection region should be of the form:

\({r : r ≤ c \text{ or } r ≥ c}\)

Because we have a sample size of N = 16, we'll use the p.m.f. of R with \(n_1 = 8\) and \(n_2 = 8\). Now, using the formulas with k = 1 and k = 8, we get:

\[P(R=2)=\dfrac{2\binom{7}{0}\binom{7}{0}}{\binom{16}{8}}=\dfrac{2\binom{7}{7}\binom{7}{7}}{\binom{16}{8}} =P(R=16)=\dfrac{2}{12,870} \]

Using the formulas with k = 1 and k = 7, we get:

\[P(R=3)=\dfrac{\binom{7}{1}\binom{7}{0}+\binom{7}{1}\binom{7}{0}}{\binom{16}{8}}=\dfrac{\binom{7}{7}\binom{7}{6}+\binom{7}{7}\binom{7}{6}}{\binom{16}{8}}=P(R=15)=\dfrac{14}{12,870} \]

And, using the formulas with k = 2 and k = 7, we get:

\[P(R=4)=\dfrac{2\binom{7}{1}\binom{7}{1}}{\binom{16}{8}}=\dfrac{2\binom{7}{6}\binom{7}{6}}{\binom{16}{8}} =P(R=14)=\dfrac{98}{12,870} \]

Therefore, the critical region \({r : r ≤ 4 \text{ or } r ≥ 14}\) would yield a test at a significance level of:

\[\alpha = P(R \le 4)+P(R \ge 14)=\dfrac{2+14+98}{12,870}+\dfrac{2+14+98}{12,870}=\dfrac{228}{12,870}=0.018\]

The 7 runs we observed does not fall in our proposed rejection region. Therefore, we fail to reject at the 0.018 significance level. There is insufficient evidence at the 0.018 level to conclude that the process has deviated from randomness.

Notes

  1. In both the discussion and the example above, the sample size N was even. What happens if the sample size N is odd? Well, in that case, when we divide the sample up into those below (L) and those above (U) the median, the number of observations in the "upper half" and the "lower half" will differ by one. If that's the case, the standard protocol is to put the extra observation in the "upper half." In that case, \(n_2 = n_1 + 1\).
  2. If the median is equal to a value that is tied with other values, standard protocol again puts the tied values in the "upper half," and the test is performed with \(n_1\) and \(n_2\) being unequal.

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility