2.3 - Determining Power

We begin this part by defining the power of a hypothesis test. This also provides another way of determining the sample size. The power is the probability of achieving the desired outcome. What is the desired outcome of a hypothesis test? Usually rejecting the null hypothesis. Therefore, power is the probability of rejecting the null hypothesis when in fact the alternative hypothesis is true.

Decision \(H_0\) \(H_A\)
Reject Null Hypothesis
Type I Error - \(\alpha\)
OK
Accept Null Hypothesis OK

Type II Error - \(\beta\)

Note!

P(Reject \(\mathbf{H_0}\) | \(\mathbf{H_0}\) is true) = \(\alpha\): P(Type I Error)

P(Accept \(\mathbf{H_0}\) | \(\mathbf{H_A}\) is true) = \(\beta\): P(Type II Error)

Therefore the power of the test is P(Reject \(\mathbf{H_0}\) | \(\mathbf{H_A}\) is true) = 1-\(\beta\).

Before any experiment is conducted you typically want to know how many observations you will need to run. If you are performing a study to test a hypothesis, for instance in the blood pressure example where we are measuring the efficacy of the blood pressure medication, if the drug is effective there should be a difference in the blood pressure before and after the medication. Therefore we want to reject our null hypothesis, and thus we want the power (i.e. the probability of rejecting the \(\mathbf{H_0}\) when it is false) to be as high as possible.

We will describe an approach to determine the power, based on a set of operating characteristic curves traditionally used in determining power for the t-test. Power depends on the level of the test, \(\alpha\), the actual true difference in means, and n (the sample size). Figure 2.13 (2.12 in 7th ed) in the text gives the operating characteristic curves where \(\beta\) is calculated for \(n* = 2n - 1\) for an \(\alpha = 0.05\) level test. When you design a study you usually plan for equal sample size, since this gives the highest power in your results. We will look at special cases where you might deviate from this but generally, this is the case.

To use the Figure in the text, we need to first calculate the difference in means measured in numbers of standard deviation, i.e. \(\lvert \mu_1-\mu_2 \rvert /  \sigma\). You can think of this as a signal to noise ratio, i.e. how large or strong is the signal, \(\lvert \mu_1-\mu_2 \rvert\), in relation to the variation in the measurements, \(\sigma\) We are not using the symbols in the text, because the 2 editions define d and \(\delta\) differently. Different software packages or operating characteristic curves may require either \(\lvert \mu_1-\mu_2 \rvert /  \sigma\) or \(\lvert \mu_1-\mu_2 \rvert / 2 \sigma\) to compute sample sizes or estimate power, so you need to be careful in reading the documentation. Minitab avoids this by asking for \(\lvert \mu_1-\mu_2 \rvert\) and \(\sigma\) separately, which seems like a very sensible solution.

Again,

Example calculations

Let's consider an example in the two sample situation. We will let \(\alpha = .05, |\mu_1 - \mu_2| = 8\) (the difference between the two means), and the sigma (assumed true standard deviation) would equal 12, and finally, let the number of observations in each group n = 5.

In this case, \(\lvert \mu_1-\mu_2 \rvert / \sigma = 8/12 = .66\), and \(n* = 2n - 1 = 9\).

If you look at the Figure you get approximately a \(\beta\) of about 0.9. Therefore, power - or the chance of rejecting the null hypothesis prior to doing the experiment is \(1 - \beta\) or \(1 - 0.9 = 0.1\) or about ten percent of the time. With such low power we should not even do the experiment!

If we were willing to do a study that would only detect a true difference of, let's say, \(\lvert \mu_1-\mu_2 \rvert = 18\) then and n* would still equal 9, then figure 2-12 the Figure shows that \(\beta\) looks to be about .5 and the power or chance of detecting a difference of 18 is also 5. This is still not very satisfactory since we only have a 50/50 chance of detecting a true difference of 18 even if it exists.

Finally, we calculate the power to detect this difference of 18 if we were to use n = 10 observations per group, which gives us \(n* = 19\). For this case \(\beta = 0.1\) and thus \(\text{power} = 1- \beta = 0.9\) or 90%, which is quite satisfactory.

These calculations can also be done in Minitab as shown below. Under the Menu > Stat > Power and Sample Size > 2-sample t, simply input sample sizes, \(n = 10\), differences \(\delta = 18\), and standard deviation \(\sigma = 12\).

Another way to improve power is to use a more efficient procedure - for example, if we have paired observations we could use a paired t-test. For instance, if we used the paired t-test, then we would expect to have a much smaller sigma – perhaps somewhere around 2 rather than 12. So, our signal to noise ratio would be larger because the noise component is smaller. We do pay a small price in doing this because our t-test would now have degrees of freedom \(n - 1\), instead of \(2n - 2\).

The take-home message here is:

If you can reduce variance or noise, then you can achieve an incredible savings in the number of observations you have to collect. Therefore the benefit of a good design is to get a lot more power for the same cost or much-decreased cost for the same power.

We now show another approach to calculating power, namely using software tools rather than a graph. Let's take a look at how Minitab handles this below.

You can use these dialog boxes to plug in the values that you have assumed and have Minitab calculate the sample size for a specified power or the power that would result, for a given sample size.

Try It! Section

Use the assumptions above, and confirm the calculations of power for these values.