11.1.1 - The Sign Test

Suppose we are interested in testing the population median. The hypotheses are similar to those we have seen before but use the median, \(\eta\), instead of the mean.

If the hypotheses are based on the median, they would look like the following:

Null

\(H_0\colon\eta=\eta_0\)

Alternative

\(H_a\colon\eta>\eta_0\)

\(H_a\colon\eta<\eta_0\)

\(H_a\colon\eta\ne\eta_0\)

For the IRS example, the null and alternative are:

\(H_0\colon \eta=160\) vs \(H_a\colon \eta >160\)

Consider the test statistic, \(S^+\), where

\(S^+=\text{the number of observations greater than 160}\)

Under the null hypothesis, \(S^+\), should be about 50% of the observations. Therefore, \(S^+\) should have a binomial distribution with parameters \(n\) and \(p=0.5\). Let’s review and verify that it is a Binomial random variable.

The number of trials, \(n\), is fixed and known. Here the number of trials equals the number of observations. Therefore, in this case, \(n\) is fixed and known.
The outcomes of each trial can be categorized as either a "success" or a "failure", with the probability of success being \(p\). Observations can either be above the median (a "success") or below the median (a "failure") with the probability of being above the median being \(p=½\).
The probability of "success" remains constant from trial to trial. The probability of being above the median remains the same for each observation.
The trials are independent. Each of the observations is independent of the next, so we are okay here.

Now, back to our problem. To make a conclusion, we need to find the p-value. It is the probability of seeing what we see or something more extreme given the null hypothesis is true.

In the IRS example, let’s find \(S^+\), or in other words, let's find the number of observations that fall above 160. We find \(S^+=15\).

Using the Binomial distribution function, we can find the p-value as \(P(S^+\ge 15)\):

\begin{align} P(X\ge15)&=\sum_{i=15}^{30} {30\choose i}(0.5)^{30-i}(0.5)^i\\&=\sum_{i=15}^{30} {30\choose i} (0.5)^{30}\\&\approx 0.5722322 \end{align}

If we assume the significance level is 5%, then the p-value\(>0.05\). We would fail to reject the null hypothesis and conclude that there is no evidence in the data to suggest that the median is above 160 minutes.

This test is called the Sign Test and \(S^+\) is called the sign statistic. The Sign Test is also known as the Binomial Test.

Let's recap what we found. The research question was to see if it took longer than 160 minutes to complete the 1040 form. The measurement was the time in minutes to complete the form. Here is a summary:

Assumption	Test Statistic	p-value	Conclusion
Data are Normally distributed	t statistic (\(t^*)\)	0.04991	Reject \(H_0\)
Quantitative Data	Sign statistic \((S^+)\)	0.572	Fail to reject \(H_0\)

Here, we have two opposite conclusions from each of the tests. Given the shape of the data, which do you think is the valid conclusion?

Minitab^®

Minitab Sign Test Section

We can use Minitab to conduct the Sign test.

Click Stat > Nonparametrics > 1-Sample sign
Enter your 'variable', 'significance level', and adjust for the alternative.
Click OK .

Example 11-2: Tax Forms (Sign Test) Section

Conduct the test for the median time for filling out the tax forms using the Sign Test in Minitab. Download the dataset: [irs.txt]

Answer

Conducting the test in Minitab yields the following output.

1-Sample Sign Test

Method

\(\eta\): median of time

Descriptive Statistics

N	Median
30	164

95% Lower bound for \(\eta\)

Lower Bound for \(\eta\)	Achieved Confidence	Position
120.000	89.98%	12
116.085	95.00%	Interpolation
116.000	95.06%	11

Test

Null hypothesis

Alternative hypothesis

H_o: \(\eta\) = 160

H₁: \(\eta\) > 160

Number < 160	Number = 160	Number > 160	P-Value
15	0	15	0.5722

You can see the p-value and \(S^+\), which is the number greater than 160.

As you can see in the Minitab output, you can also find a confidence interval for the population median based on the sign statistic. As you can imagine, finding the confidence interval by hand is a bit tricky. The interpretation of the confidence interval for the median has the same template interpretation as the confidence interval for the population mean.

We present the details of the Sign Test because it can be found based on the material we covered so far in the course. For the next section, we present another test and how to do it in Minitab but leave out the details.