Lesson 25: Power of a Statistical Test
Lesson 25: Power of a Statistical TestOverview
Whenever we conduct a hypothesis test, we'd like to make sure that it is a test of high quality. One way of quantifying the quality of a hypothesis test is to ensure that it is a "powerful" test. In this lesson, we'll learn what it means to have a powerful hypothesis test, as well as how we can determine the sample size n necessary to ensure that the hypothesis test we are conducting has high power.
25.1 - Definition of Power
25.1 - Definition of PowerLet's start our discussion of statistical power by recalling two definitions we learned when we first introduced to hypothesis testing:
- A Type I error occurs if we reject the null hypothesis
(in favor of the alternative hypothesis ) when the null hypothesis is true. We denote . - A Type II error occurs if we fail to reject the null hypothesis
when the alternative hypothesis is true. We denote .
You'll certainly need to know these two definitions inside and out, as you'll be thinking about them a lot in this lesson, and at any time in the future when you need to calculate a sample size either for yourself or for someone else.
Example 25-1

The Brinell hardness scale is one of several definitions used in the field of materials science to quantify the hardness of a piece of metal. The Brinell hardness measurement of a certain type of rebar used for reinforcing concrete and masonry structures was assumed to be normally distributed with a standard deviation of 10 kilograms of force per square millimeter. Using a random sample of
- the null hypothesis
- against the alternative hypothesis
If the engineer decides to reject the null hypothesis if the sample mean is 172 or greater, that is, if
Answer
In this case, the engineer commits a Type I error if his observed sample mean falls in the rejection region, that is, if it is 172 or greater, when the true (unknown) population mean is indeed 170. Graphically,
Now, we can calculate the engineer's value of
Doing so, we get:
So, calculating the engineer's probability of committing a Type I error reduces to making a normal probability calculation. The probability is 0.1587 as illustrated here:
A probability of 0.1587 is a bit high. We'll learn in this lesson how the engineer could reduce his probability of committing a Type I error.
If, unknown to engineer, the true population mean were
Answer
In this case, the engineer commits a Type II error if his observed sample mean does not fall in the rejection region, that is, if it is less than 172, when the true (unknown) population mean is 173. Graphically,
Again, we can calculate the engineer's value of
So, calculating the engineer's probability of committing a Type II error again reduces to making a normal probability calculation. The probability is 0.3085 as illustrated here:
A probability of 0.3085 is a bit high. We'll learn in this lesson how the engineer could reduce his probability of committing a Type II error.
If you think about it, considering the probability of committing a Type II error is quite similar to looking at a glass that is half empty. That is, rather than considering the probability that the engineer commits an error, perhaps we could consider the probability that the engineer makes the correct decision. Doing so, involves calculating what is called the power of the hypothesis test.
- Power of the Hypothesis Test
-
The power of a hypothesis test is the probability of making the correct decision if the alternative hypothesis is true. That is, the power of a hypothesis test is the probability of rejecting the null hypothesis
when the alternative hypothesis is the hypothesis that is true.
Let's return to our engineer's problem to see if we can instead look at the glass as being half full!
Example 25-1 (continued)
If, unknown to the engineer, the true population mean were
Answer
In this case, the engineer makes the correct decision if his observed sample mean falls in the rejection region, that is, if it is greater than 172, when the true (unknown) population mean is 173. Graphically, the power of the engineer's hypothesis test looks like this:
That makes the power of the engineer's hypothesis test 0.6915 as illustrated here:
which of course could have alternatively been calculated by simply subtracting the probability of committing a Type II error from 1, as shown here:
At any rate, if the unknown population mean were 173, the engineer's hypothesis test would be at least a bit better than flipping a fair coin, in which he'd have but a 50% chance of choosing the correct hypothesis. In this case, he has a 69.15% chance. He could still do a bit better.
In general, for every hypothesis test that we conduct, we'll want to do the following:
-
Minimize the probability of committing a Type I error. That, is minimize
. Typically, a significance level of is desired. -
Maximize the power (at a value of the parameter under the alternative hypothesis that is scientifically meaningful). Typically, we desire power to be 0.80 or greater. Alternatively, we could minimize
, aiming for a type II error rate of 0.20 or less.
By the way, in the second point, what exactly does "at a value of the parameter under the alternative hypothesis that is scientifically meaningful" mean? Well, let's suppose that a medical researcher is interested in testing the null hypothesis that the mean total blood cholesterol in a population of patients is 200 mg/dl against the alternative hypothesis that the mean total blood cholesterol is greater than 200 mg/dl. Well, the alternative hypothesis contains an infinite number of possible values of the mean. Under the alternative hypothesis, the mean of the population could be, among other values, 201, 202, or 210. Suppose the medical researcher rejected the null hypothesis, because the mean was 201. Whoopdy-do...would that be a rocking conclusion? No, probably not. On the other hand, suppose the medical researcher rejected the null hypothesis, because the mean was 215. In that case, the mean is substantially different enough from the assumed mean under the null hypothesis, that we'd probably get excited about the result. In summary, in this example, we could probably all agree to consider a mean of 215 to be "scientifically meaningful," whereas we could not do the same for a mean of 201.
Now, of course, all of this talk is a bit if gibberish, because we'd never really know whether the true unknown population mean were 201 or 215, otherwise, we wouldn't have to be going through the process of conducting a hypothesis test about the mean. We can do something though. We can plan our scientific studies so that our hypothesis tests have enough power to reject the null hypothesis in favor of values of the parameter under the alternative hypothesis that are scientifically meaningful.
25.2 - Power Functions
25.2 - Power FunctionsExample 25-2

Let's take a look at another example that involves calculating the power of a hypothesis test.
Let
What is the power of the hypothesis test if the true population mean were
Answer
Setting
because we transform the test statistic
Now, that implies that the power, that is, the probability of rejecting the null hypothesis, when
and illustrated here:
In summary, we have determined that we have (only) a 64.06% chance of rejecting the null hypothesis
What is the power of the hypothesis test if the true population mean were
Answer
Because we are setting
and illustrated here:
In summary, we have determined that we now have a 91.31% chance of rejecting the null hypothesis
What is the power of the hypothesis test if the true population mean were
Answer
Again, because we are setting
and illustrated here:
In summary, we have determined that, in this case, we have a 99.09% chance of rejecting the null hypothesis
Are you growing weary of this? Let's summarize a few things we've learned from engaging in this exercise:
- First and foremost, my instructor can be tedious at times..... errrr, I mean, first and foremost, the power of a hypothesis test depends on the value of the parameter being investigated. In the above, example, the power of the hypothesis test depends on the value of the mean
. - As the actual mean
moves further away from the value of the mean under the null hypothesis, the power of the hypothesis test increases.
It's that first point that leads us to what is called the power function of the hypothesis test. If you go back and take a look, you'll see that in each case our calculation of the power involved a step that looks like this:
That is, if we use the standard notation
So, the reality is your instructor could have been a whole lot more tedious by calculating the power for every possible value of
Now, what can we learn from this plot? Well:
-
We can see that
(the probability of a Type I error), (the probability of a Type II error), and are all represented on a power function plot, as illustrated here: -
We can see that the probability of a Type I error is
, that is, the probability of rejecting the null hypothesis when the null hypothesis is true is 0.05. -
We can see the power of a test
, as well as the probability of a Type II error , for each possible value of . -
We can see that
and vice versa, that is, . -
And we can see graphically that, indeed, as the actual mean
moves further away from the null mean , the power of the hypothesis test increases.
Now, what would do you suppose would happen to the power of our hypothesis test if we were to change our willingness to commit a Type I error? Would the power for a given value of
Example 25-2 (continued)

Let
What is the power of the hypothesis test if the true population mean were
Answer
Setting
because:
That means that the probability of rejecting the null hypothesis, when
So, the power when
By the way, we could again alternatively look at the glass as being half-empty. In that case, the probability of a Type II error when
All of this can be seen graphically by plotting the two power functions, one where
This last example illustrates that, providing the sample size
25.3 - Calculating Sample Size
25.3 - Calculating Sample SizeBefore we learn how to calculate the sample size that is necessary to achieve a hypothesis test with a certain power, it might behoove us to understand the effect that sample size has on power. Let's investigate by returning to our IQ example.
Example 25-3

Let
What is the power of the hypothesis test when
Answer
Setting
because:
Therefore, the power function \K(\mu)\), when
Therefore, the probability of rejecting the null hypothesis at the
And, the probability of rejecting the null hypothesis at the
And, the probability of rejecting the null hypothesis at the
In summary, in the various examples throughout this lesson, we have calculated the power of testing
As you can see, our work suggests that for a given value of the mean
As this plot suggests, if we are interested in increasing our chance of rejecting the null hypothesis when the alternative hypothesis is true, we can do so by increasing our sample size
Example 25-4

Let
Answer
As is always the case, we need to start by finding a threshold value
That is, in order for our hypothesis test to be conducted at the
But, that's not the only condition that
This illustration suggests that in order for our hypothesis test to have 0.90 power, the following statement must hold (using our usual
Aha! We have two (asterisked (**)) equations and two unknowns! All we need to do is equate the equations, and solve for
Now that we know we will set
So, in summary, if the agricultural researcher collects data on
Example 25-5

Consider
Answer
In this case, because we are interested in performing a hypothesis test about a population proportion
Again, we start by finding a threshold value
That is, in order for our hypothesis test to be conducted at the
But, again, that's not the only condition that c must meet, because
This illustration suggests that in order for our hypothesis test to have 0.80 power, the following statement must hold:
Again, we have two (asterisked (**)) equations and two unknowns! All we need to do is equate the equations, and solve for
Now that we know we will set
So, in summary, if the pollster collects data on
Incidentally, we can always check our work! Conducting the survey and subsequent hypothesis test as described above, the probability of committing a Type I error is:
and the probability of committing a Type II error is:
just as the pollster had desired.
We've illustrated several sample size calculations. Now, let's summarize the information that goes into a sample size calculation. In order to determine a sample size for a given hypothesis test, you need to specify:
-
The desired
level, that is, your willingness to commit a Type I error. -
The desired power or, equivalently, the desired
level, that is, your willingness to commit a Type II error. -
A meaningful difference from the value of the parameter that is specified in the null hypothesis.
-
The standard deviation of the sample statistic or, at least, an estimate of the standard deviation (the "standard error") of the sample statistic.