Lesson 9: Etiologic Studies (3) Cohort Study Design; Sample Size and Power Considerations

Lesson 9: Etiologic Studies (3) Cohort Study Design; Sample Size and Power Considerations

In this week's lesson, we will cover the design of a cohort study. We will also review sample size and power considerations as applied to epidemiologic studies.

Let's get started!

Cohort Study Design

A cohort study is useful for estimating the risk of disease, the incidence rate, and/or relative risks. Non-cases may be enrolled from a well-defined population, current exposure status (at \(t_0\)) determined, and the onset of disease observed in the subjects over time. Disease status at \(t_1\) can be compared to exposure status at \(t_0\). The data may be displayed as follows:

Case

(Number)

Non-cases

(Unnecessary if Totl known)

Total Exposure

(Person*Time)

Exposed A B \(\text{Total}_{\text{Exposed}}\)
Not-exposed C D \(\text{Total}_{\text{Non-exposed}}\)
Totals \(\text{Total}_{\text{cases}}\) \(\text{Total}_{\text{non-cases}}\) Total

As you have learned, measures of disease frequency and effect or association can be calculated from these data:

  1. Incidence Density (Incidence Rate):

    Among Exposed: \(\dfrac{A}{T_{\text{Exposed}}}\)

    Among Nonexposed: \(\dfrac{C}{T_{\text{Nonexposed}}}\)

  2. Incidence Density Ratio (Risk Ratios; Relative Risk)

    \(\dfrac{(\frac{A}{T_{\text{Exposed}}})}{(\frac{C}{T_{\text{Nonexposed}}})}\)

  3. Attributable Risk

    \((\frac{A}{T_{\text{Exposed}}}) - (\frac{C}{T_{\text{Nonexposed}}})\)

Types of Cohort Studies

The simplest cohort design is prospective, i.e., following a group forward in time, but a cohort study can also be 'retrospective'. In general, the descriptor, 'prospective' or 'retrospective', indicates when the cohort is identified relative to the initiation of the study.

  1. Prospective cohort (concurrent; longitudinal study) - An investigator identifies the study population at the beginning of the study and accompanies the subjects through time. In a prospective study, the investigator begins the study at the same time as the first determination of exposure status of the cohort. When proposing a prospective cohort study, the investigator first identifies the characteristics of the group of people he/she wishes to study. The investigator then determines the present case status of individuals, selecting only non-cases to follow forward in time. Exposure status is determined at the beginning of the study.

    • Problems: loss to follow up; differential nonresponse; loss of funding support; continually improving methods for detecting exposure (leading to greater misclassification than would be expected in current practice)
    • Examples: Framingham Study; Nurses Health Study; National Health and Nutrition Examination Study Follow up Study. These are all studies where case status was determined at the beginning of the cohort and cases eliminated from the study. Exposure was then measured who were followed over a period of time until reaching the study endpoint. A member of the cohort reaches the endpoint either by dying, becoming a case, or reaching the end of the study period. A subject can also be lost to follow-up over the course of the study. The investigator progresses through time with the subjects in a prospective cohort study. Such a study may also be called a longitudinal or a concurrent study, as opposed to a retrospective cohort study.
  2. Retrospective cohort study (historical cohort; non-concurrent prospective cohort) - An investigator accesses a historical roster of all exposed and nonexposed persons and then determines their current case/non-case status. The investigator initiates the study when the disease is already established in the cohort of individuals, long after the original measurement of exposure. Doing a retrospective cohort study requires good data on exposure status for both cases and noncases at a designated earlier timepoint.

    Try it!

    How does a retrospective cohort study differ from a case-control study? Suppose you are investigating the possibility of an environmentally-linked cancer among students at a university. How would the sample selected for a case-control study differ from those included in a retrospective cohort study?

    Both types of studies identify present cases and non-cases.

    The case-control study identifies the cases and then selects appropriate controls. An entire cohort is not used. If you were investigating an environmentally-related cancer among university students with a case-control study, you would identify students within certain years who met the case definition for the cancer. You would select controls among students who were not a case of cancer, but matched on characteristics such as age, gender and graduation year, then determine their exposure status (perhaps proximity of their campus address to the identified toxin) and compare exposures between cases and non-cases.

    A retrospective cohort study uses the entire cohort; all cases and non-cases within the identified group. A retrospective cohort design might designate the cohort to be students enrolled at the university over a 5 year time span. The present case status of all these students is determined and historical data about their exposure status accessed, in order to assess the relationship between being a case of the cancer with the exposure.

    Potential problems with the retrospective cohort approach include selection bias and misclassification bias because of the retrospective nature of the study. However, the retrospective cohort design can be useful when reliable records are available, such as in occupational studies where levels of exposure to environmental exposures are monitored and recorded in a database. Investigators can determine the case status of the entire group at the present time, then use the exposure records to assess the relationship between exposure and disease. To make the terminology even more confusing, these types of designs can also be labeled historical cohort studies or non-concurrent prospective cohort studies.

    Try it!

    Is a panel study the same as a cohort study??

    A panel study is not the same as an epidemiologic cohort study which identifies a group of people, measures their exposure status (which might help determine who is in the group), assures that subjects do not have the stated outcome of interest, and then follows them over time (possibly with measures at different points in time) to see if they develop the outcome. The cohort is selected based upon exposure and absence of the outcome and followed over time to specified outcome.

    A panel study is a longitudinal study of a cohort of people with multiple measures over time. They are a cohort because they share something in common (e.g, employment, retirement). There is generally limited sampling with respect to exposure, and there is no assurance of not being diseased (or having a specific outcome) when they enter the study. A disease or outcome of interest is not specified... The panel study is a group of people who share a characteristic and they are progressing through time together to undetermined outcomes.

  3. Investigator enters well-after cohort is enrolled, but well-before case determination

    Sometimes investigators enter into an ongoing prospective cohort study before the cases are determined. An investigator may pose a new question during an intermediate time period of an ongoing study. The cohort is already determined. Exposure and case status information are available from the beginning of the ongoing study; subjects can be followed forward to collect cases. For example, suppose genotyping is performed in a cohort study. Later, an investigator may decide to use these data and subsequent case status to consider the relationship of a genetic factor to a particular disease. The investigator may wish to follow the subjects several more years to ascertain more cases. This would be a mixture of a prospective and retrospective cohort.

Objectives

Upon completion of this lesson, you should be able to:

  • distinguish between a cohort study, case-control study, case-control nested within a cohort and a case-cohort study and discuss the relative advantages and disadvantages of each design,
  • describe the relationships between sample size, power, variability, effect size, and significance level, and
  • calculate sample size, given the necessary background information.

9.1 - Advanced Cohort Study Design

9.1 - Advanced Cohort Study Design

You have seen combinations of case-control design with cohort design earlier...recall:

Case-cohort
Selection of cases and a sub-cohort from the source, or original, cohort - i.e., you identify the cases from the existing cohort and select a smaller set of the cohort to follow over time as a comparison to the cases
Nested case-control
Within a cohort study, cases and controls are selected for a smaller case-control investigation, for example, determining case status and collecting blood or tissue samples among the cases and selected controls in a cohort; genotyping (exposure status) is done using the samples for these subjects, not the entire cohort; the association between disease and the genetic factor is assessed)

Once again, recall the comparative advantages and disadvantages of case-control and cohort studies:

Cohort

  • Calculates incidence rate, risk, and relative risk
  • Potentially more strong for causal investigations
  • Expensive
  • Long-term study
  • Need large sample size
  • Good for rare exposure
  • Good for multiple outcomes
  • Less potential for recall bias
  • More potential for loss-to-followup
  • Possibly generalizable
  • Allows examination of natural course of disease, survival

Case-Control

  • Only an estimate of relative risk
  • Potentially less strong for causal investigation
  • Inexpensive
  • Short-term study
  • Powerful with small sample of cases
  • Good for rare disease
  • Good for multiple exposures
  • More potential for recall bias
  • Less potential for loss-to-followup
  • Probably not generalizable
  • Does not allow examination of natural course of disease, survival

9.2 - Comparison of Cohort to Case/Control Study Designs with Regard to Sample Size

9.2 - Comparison of Cohort to Case/Control Study Designs with Regard to Sample Size

Sample sizes for cohort studies depend upon the rate of the outcome, not the prevalence of exposure. Sample size for case-control studies is dependent upon prevalence of exposure, not the rate of outcome. Because the rate of outcome is usually smaller than the prevalence of the exposure, cohort studies typically require larger sample sizes to have the same power as a case-control study.

The example below is from a study of smoking and coronary heart disease where the background incidence rate was 0.09 events per person-year among the non-exposed group and the prevalence of the risk factor was 0.3.

The sample size requirements to detect a given relative risk with the 90% power using two-sided 5% significance tests for cohort and case-control studies are listed below:

Relative Risk Cohort study Case-Control study
1.1 44,398 21,632
1.2 11,568 5,820
1.3 5,346 2,774
1.4 3,122 1,668
1.5 2,070 1,138
2.0 602 376
3.0 188 146

from Woodward, M. Epidemiology Study Design and Analysis. Boca Raton: Chapman and Hall:, 1999, p.359

In such a situation, with a relative risk of 1.1, more than twice the number of subjects are required for a cohort study as for a case-control study. In every study in the table, the case-control design requires a smaller sample than does the cohort study to detect the same level of increased risk. This is generally true. There is also a dependence upon the rate of the outcome, but in general, case-control studies involve less sampling.

Furthermore, in designing a cohort study, loss-to-follow-up is important to consider. Based on your own experience or that of the literature, any sample size calculation should be inflated to account for the expected drop-outs. For example, if the drop-out rate is expected to be 5%, multiply n by 1/(1-0.05) and recruit the increased number of subjects.


9.3 - Example 9-1: Population-based cohort or a cross-sectional studies

9.3 - Example 9-1: Population-based cohort or a cross-sectional studies

Example 9-1

Suppose you are interested in the question: "Does one group have a prevalence percentage that is different than other groups?" For example:

Baseline prevalence of smoking in a particular community is 30%. A clean indoor air policy goes into effect. What is the sample size required to detect a decrease in smoking prevalence of at least 2 percentage points? \(\alpha=0.05\); 90% power.

We are interested in testing the following hypothesis:

Null hypothesis:

\(H_0\colon \text{prevalence}_{(Before)}\le \text{prevalence}_{(After)}\)

Alternative hypothesis:

\(H_A\colon \text{prevalence}_{(Before)}- \text{prevalence}_{(After)}=\delta\)

Where \(\delta \gt 0\)

The resulting formula for the sample size for testing a difference in prevalence using a one-sided test is as follows:

and for this example, n can be calculated as:

\(n=\dfrac{1}{d^{2}}\left [ z_{\alpha }\sqrt{\pi_{0}(1-\pi_{0})}+z_{\beta }\sqrt{\pi_{1}(1-\pi_{1})} \right ]^{2}\)

Replace \(z_{\alpha }\) by \(z_{\alpha/2 }\) for a two-sided test

Take a moment to look at the table below for sample size requirements for testing the value of a single proportion with a one-sided test. Prevalence can be found along the top of the table and the percentage point difference vertically on the left. How many individuals do we need to include in our study in order to meet the above criteria?

(Tables from Woodward, M. Epidemiology Study Design and Analysis. Boca Raton: Chapman and Hall:, 1999 )

 

Table B.8. Sample size requirements for testing the value of a single proportion

These tables give requirements for a one-sided test directly. For two-sided tests, use the table corresponding to half the required significance level. Note that \(\pi_{0}\) is the hypothesized proportion (under \(H_{0}\)) and \(d\) is the difference to be tested.

(a) 5% significance, 90% power

\(\pi_{0}\)

\(d\) 0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95
0.01 1 178 8 001 13 923 18 130 20 625 21 406 20 475 17 830 13 473 7 400 3 717
0.02 366 2 070 3 534 4 567 5 172 5 349 5 097 4 417 3 308 1 769 833
0.03 192 950 1 593 2 045 2 305 2 376 2 255 1 944 1 443 748 322
0.04 123 551 908 1 158 1 300 1 335 1 262 1 083 795 398 148
0.05 88 362 589 746 834 853 804 686 498 239  
0.06 67 258 414 521 580 591 555 471 338 155  
0.07 54 194 308 385 427 434 405 342 242 104  
0.08 44 152 238 296 327 331 308 258 181 71  
0.09 38 123 190 235 259 261 242 201 139 48  
0.10 32 102 156 191 210 211 195 161 109    
0.15 18 49 72 87 93 92 83 66 40    
0.20 12 30 42 49 52 50 44 33      
0.25 9 20 27 31 33 31 26 18      
0.30 7 14 19 22 22 20 16        
0.35 5 11 14 16 16 14 10        
0.40 4 9 11 12 11 10          
0.45 4 7 8 9 8 6          
0.50 3 6 7 7 6            

 

Table B.8. Sample size requirements for testing the value of a single proportion

These tables give requirements for a one-sided test directly. For two-sided tests, use the table corresponding to half the required significance level. Note that \(\pi_{0}\) is the hypothesized proportion (under \(H_{0}\)) and \(d\) is the difference to be tested.

(a) 5% significance, 90% power

\(\pi_{0}\)

\(d\) 0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95
0.01 1 178 8 001 13 923 18 130 20 625 21 406 20 475 17 830 13 473 7 400 3 717
0.02 366 2 070 3 534 4 567 5 172 5 349 5 097 4 417 3 308 1 769 833
0.03 192 950 1 593 2 045 2 305 2 376 2 255 1 944 1 443 748 322
0.04 123 551 908 1 158 1 300 1 335 1 262 1 083 795 398 148
0.05 88 362 589 746 834 853 804 686 498 239  
0.06 67 258 414 521 580 591 555 471 338 155  
0.07 54 194 308 385 427 434 405 342 242 104  
0.08 44 152 238 296 327 331 308 258 181 71  
0.09 38 123 190 235 259 261 242 201 139 48  
0.10 32 102 156 191 210 211 195 161 109    
0.15 18 49 72 87 93 92 83 66 40    
0.20 12 30 42 49 52 50 44 33      
0.25 9 20 27 31 33 31 26 18      
0.30 7 14 19 22 22 20 16        
0.35 5 11 14 16 16 14 10        
0.40 4 9 11 12 11 10          
0.45 4 7 8 9 8 6          
0.50 3 6 7 7 6            

Try It!

Looking at the table values, what happens to the necessary sample size as:
  1. Prevalence increases (\(B_0\))? Does the sample size increase or decrease?
  2. What happens to the sample size as effect size decreases?
  3. What is the minimal detectable difference if you had funds for 1,500 subjects?
  1. The largest sample sizes occur with baseline prevalence at 0.5
  2. The smaller the effect size, the larger the sample size
  3. About 3.6% decrease in prevalence

9.4 - Example 9-2: Ratios in a population-based study (relative risks, relative rates or prevalence ratios)

9.4 - Example 9-2: Ratios in a population-based study (relative risks, relative rates or prevalence ratios)

Example 9-2

Suppose the rate of disease in an unexposed population is 10/100 person-years. You hypothesize an exposure has a relative risk of 2.0. How many persons must you enroll assuming half are exposed and half are unexposed to detect this increased risk? \(\alpha=0.05\) and 90% power.

Here are the hypotheses:

Null hypothesis

\(H_0\colon \text{Incidence}_{(Unexposed)} \le \text{Incidence}_{(Exposed)}\)

Alternative hypothesis

\(H_A\colon \text{Incidence}_{(Unexposed)} \le \text{Incidence}_{(Exposed)}=\lambda\)

Where:

\(\lambda \gt 0\)
\(\text{Incidence}_{(Exposed)}=p(\text{Disease|Exposed})\)
\(\text{Incidence}_{(Unexposed)}=p(\text{Disease|Not Exposed})\)

and the resulting formula:

\(n=\dfrac{r+1}{r(\lambda -1)^{2}\pi^{2} }\left [ z_{\alpha }\sqrt{(r+1)p_{c}(1-p_{c})}+z_{\beta }\sqrt{\lambda \pi (1-\lambda \pi)+r\pi(1-\pi )} \right ]^{2}\)

where \(\pi=\pi_2\) is the proportion in the reference group and \(p_c\) is the common proportion over the two groups, which is estimated as:

\(p_{c}=\dfrac{\pi (r\lambda +1)}{r+1}\)

When r = 1 (equal-sized groups), the formula above reduces to:

\(p_{c}=\dfrac{\pi (\lambda +1)}{2}=\dfrac{\pi_{1}+\pi_{2} }{2}\)

Let's take a look at tabulated results:

 

Table B.9. Sample size requirements (for the two groups combined) for testing the ratio of two proportions (relative risk) with equal numbers in each group

These tables give requirements for a one-sided test directly. For two-sided tests, use the table corresponding to half the required significance level. Note that \(\pi\) is the proportion for the reference group (the denominator) and \(\lambda\) is the relative risk to be tested.

(a) 5% significance, 90% power

\(\pi\)

\(\lambda\) 0.001 0.005 0.010 0.050 0.100 0.150 0.200 0.500 0.900
0.10 23 244 4 636 2 310 488 216 138 100 30 8
0.20 32 090 6 398 3 188 618 298 190 136 40 10
0.30 45 406 9 052 4 508 874 418 268 192 56 14
0.40 66 554 13 268 6 606 1 278 612 390 278 78 18
0.50 102 678 20 466 10 190 1 968 940 598 426 118 26
0.60 171 126 34 104 16 976 3 274 1 562 990 706 192 38
0.70 323 228 64 410 32 058 6 176 2 940 1 862 1 322 352 62
0.80 770 020 153 422 76 348 14 688 6 980 4 412 3 128 814 126
0.90 3 251 102 647 690 322 264 61 924 29 380 18 534 13 110 3 336 450
1.10 3 593 120 715 666 355 984 68 240 32 272 20 282 14 288 3 496 292
1.20 941 030 187 410 93 208 17 846 8 426 5 286 3 716 890  
1.30 437 234 87 068 43 298 8 280 3 904 2 444 1 714 402  
1.40 256 630 51 098 25 406 4 854 2 284 1 428 1 000 228  
1.50 171 082 34 062 16 934 3 232 1 518 948 662 148  
1.60 123 556 24 596 12 226 2 330 1 094 680 474 104  
1.80 74 842 14 896 7 402 1 408 658 408 284 58  
2.00 51 318 10 212 5 074 962 448 278 192    
3.00 17 102 3 400 1 688 316 146 88 60    
4.00 9 498 1 886 934 174 78 46 30    
5.00 6 419 1 272 630 116 52 30      
10.00 2 318 458 226 40          
20.00 992 194 94            

Click the button below to find sample size for detecting RR of 2 under conditions above.

Table B.9. Sample size requirements (for the two groups combined) for testing the ratio of two proportions (relative risk) with equal numbers in each group

These tables give requirements for a one-sided test directly. For two-sided tests, use the table corresponding to half the required significance level. Note that \(\pi\) is the proportion for the reference group (the denominator) and \(\lambda\) is the relative risk to be tested.

(a) 5% significance, 90% power

\(\pi\)

\(\lambda\) 0.001 0.005 0.010 0.050 0.100 0.150 0.200 0.500 0.900
0.10 23 244 4 636 2 310 488 216 138 100 30 8
0.20 32 090 6 398 3 188 618 298 190 136 40 10
0.30 45 406 9 052 4 508 874 418 268 192 56 14
0.40 66 554 13 268 6 606 1 278 612 390 278 78 18
0.50 102 678 20 466 10 190 1 968 940 598 426 118 26
0.60 171 126 34 104 16 976 3 274 1 562 990 706 192 38
0.70 323 228 64 410 32 058 6 176 2 940 1 862 1 322 352 62
0.80 770 020 153 422 76 348 14 688 6 980 4 412 3 128 814 126
0.90 3 251 102 647 690 322 264 61 924 29 380 18 534 13 110 3 336 450
1.10 3 593 120 715 666 355 984 68 240 32 272 20 282 14 288 3 496 292
1.20 941 030 187 410 93 208 17 846 8 426 5 286 3 716 890  
1.30 437 234 87 068 43 298 8 280 3 904 2 444 1 714 402  
1.40 256 630 51 098 25 406 4 854 2 284 1 428 1 000 228  
1.50 171 082 34 062 16 934 3 232 1 518 948 662 148  
1.60 123 556 24 596 12 226 2 330 1 094 680 474 104  
1.80 74 842 14 896 7 402 1 408 658 408 284 58  
2.00 51 318 10 212 5 074 962 448 278 192    
3.00 17 102 3 400 1 688 316 146 88 60    
4.00 9 498 1 886 934 174 78 46 30    
5.00 6 419 1 272 630 116 52 30      
10.00 2 318 458 226 40          
20.00 992 194 94            

Try It!

What happens to the necessary sample size as:
  1. Incidence rate increase \((\pi)\)?
  2. Relative risk decreases \((\lambda)\)?
  3. How would you use this table to determine sample size for 'protective' effects (i.e., nutritional components or medical procedures which prevent a negative outcome), as opposed to an increased risk?
  4. What is the minimal detectable relative risk if you had funds for 1000 subjects?
  1. n decreases
  2. Largest n is closest to l
  3. Protective effects would be those with \(\lambda \lt 1\)
  4. With a background rate of 10/100 and 1000 subjects, a relative risk of about 1.65 could be detected.

9.5 - Example 9-3 : Odds Ratios from a case/control study

9.5 - Example 9-3 : Odds Ratios from a case/control study

Example 9-3

Suppose your study design is an unmatched case-control study with equal numbers of cases and controls.

If 30% of the population is exposed to a risk factor, what is the number of study subjects (assuming an equal number of cases and controls in an unmatched study design) necessary to detect a hypothesized odds ratio of 2.0? Assume 90% power \(\alpha=0.05\).

Here are the hypotheses being tested:

Null hypothesis

\(H_0\colon \text{incidence}_{1}^* \le \text{incidence}_{2}^*\)

Alternative hypothesis

\(H_A\colon \text{incidence}_{1}^* / \text{incidence}_{2}^*=\lambda^*\)

where:

\(\lambda^*\gt0\)

\(\text{Disease incidence}_1^*=p(\text{Exposed|Case})\)

\(\text{Disease incidence}_2^*=p(\text{Not Exposed|Control})\)

The resulting sample size formula is:

\(n=\dfrac{(r+1)(1+(\lambda -1)P)^{2}}{rP^{2}(P-1)^{2}(\lambda -1)P)^{2}}\left [ z_{\alpha}\sqrt{(r+1)p_{c}^{*}(1-p_{c}^{*})} + z_{\beta}\sqrt{\frac{\lambda P(1-P)}{\left [ 1+(\lambda-1)P \right ]^{2}}+rP(1-P)} \right ]^{2}\)

where:

\(p_{c}^{*}=\dfrac{P}{r+1}\left ( \dfrac{r\lambda}{1+(\lambda -1)P}+1 \right )\)

 

Table B.10. Total sample size requirements (for the two groups combined) for unmatched case-control studies with equal numbers of cases and controls with equal numbers in each group

These tables give requirements for a one-sided test directly. For two-sided tests, use the table corresponding to half the required significance level. Note that \(P\) is the prevalence of the risk factor in the entire population and \(\lambda\) is the appropriate relative risk to be tested.

(a) 5% significance, 90% power

\(P\)

\(\lambda\) 0.010 0.050 0.100 0.200 0.300 0.400 0.500 0.700 0.900
0.10 2 318 456 224 108 70 50 40 30 38
0.20 3 206 638 316 158 104 80 66 56 88
0.30 4 546 912 458 232 160 124 106 98 176
0.40 6 676 1 348 684 356 248 200 176 172 330
0.50 10 318 2 098 1 074 566 404 332 296 306 616
0.60 17 220 3 522 1 816 974 706 588 536 576 1 206
0.70 32 570 6 698 3 476 1 890 1 390 1 174 1 088 1 206 2 612
0.80 77 686 16 052 8 382 4 614 3 438 2 944 2 764 3 146 7 012
0.90 328 374 68 156 35 786 19 922 15 020 13 006 12 354 14 400 32 892
1.10 363 666 76 090 40 352 22 918 17 630 15 574 15 096 18 316 43 550
1.20 95 332 20 020 10 664 6 112 4 744 4 228 4 134 5 102 12 340
1.30 44 334 9 342 4 998 2 888 2 260 2 032 2 002 2 510 6 166
1.40 26 044 5 506 2 958 1 722 1 358 1 230 1 222 1 554 3 870
1.50 17 376 3 684 1 986 1 166 926 846 846 1 090 2 748
1.60 12 558 2 672 1 446 854 684 628 632 826 2 106
1.80 7 618 1 630 888 532 432 400 408 546 1 420
2.00 5 230 1 124 616 374 306 288 296 404 1 074
3.00 1 754 386 218 138 120 118 126 184 522
4.00 978 220 126 84 74 76 84 130 380
5.00 664 150 88 60 56 58 66 104 316
10.00 244 60 38 30 30 34 40 70 224
20.00 108 30 20 18 20 24 30 56 190

 

Table B.10. Total sample size requirements (for the two groups combined) for unmatched case-control studies with equal numbers of cases and controls with equal numbers in each group

These tables give requirements for a one-sided test directly. For two-sided tests, use the table corresponding to half the required significance level. Note that \(P\) is the prevalence of the risk factor in the entire population and \(\lambda\) is the appropriate relative risk to be tested.

(a) 5% significance, 90% power

\(P\)

\(\lambda\) 0.010 0.050 0.100 0.200 0.300 0.400 0.500 0.700 0.900
0.10 2 318 456 224 108 70 50 40 30 38
0.20 3 206 638 316 158 104 80 66 56 88
0.30 4 546 912 458 232 160 124 106 98 176
0.40 6 676 1 348 684 356 248 200 176 172 330
0.50 10 318 2 098 1 074 566 404 332 296 306 616
0.60 17 220 3 522 1 816 974 706 588 536 576 1 206
0.70 32 570 6 698 3 476 1 890 1 390 1 174 1 088 1 206 2 612
0.80 77 686 16 052 8 382 4 614 3 438 2 944 2 764 3 146 7 012
0.90 328 374 68 156 35 786 19 922 15 020 13 006 12 354 14 400 32 892
1.10 363 666 76 090 40 352 22 918 17 630 15 574 15 096 18 316 43 550
1.20 95 332 20 020 10 664 6 112 4 744 4 228 4 134 5 102 12 340
1.30 44 334 9 342 4 998 2 888 2 260 2 032 2 002 2 510 6 166
1.40 26 044 5 506 2 958 1 722 1 358 1 230 1 222 1 554 3 870
1.50 17 376 3 684 1 986 1 166 926 846 846 1 090 2 748
1.60 12 558 2 672 1 446 854 684 628 632 826 2 106
1.80 7 618 1 630 888 532 432 400 408 546 1 420
2.00 5 230 1 124 616 374 306 288 296 404 1 074
3.00 1 754 386 218 138 120 118 126 184 522
4.00 978 220 126 84 74 76 84 130 380
5.00 664 150 88 60 56 58 66 104 316
10.00 244 60 38 30 30 34 40 70 224
20.00 108 30 20 18 20 24 30 56 190

Try it!

What happens to the necessary sample size as:

 

  1. Prevalence of the risk factor increases (P)?
  2. Odds ratio decreases (\(\lambda\))?
  1. For many \(\lambda\), 0.5 has the smallest sample size requirement
  2. largest sample sizes with OR closest to 1; 1.1 requires greater n than 0.9

We have considered three typical epidemiologic research designs. You might also ask these questions:

Should the number of controls match the number of cases? Should multiple controls be used for each case?

Observe the power curve below:

Power increases but at a decreasing rate as the ratio of controls/cases increases. Little additional power is gained at ratios higher than four controls/cases. There is little benefit to enrolling a greater ratio of controls to cases.

graph

from Woodward, M. Epidemiology Study Design and Analysis. Boca Raton: Chapman and Hall, 1999, p.265

Under what circumstances would it be recommended to enroll a large number of controls compared to cases?

Perhaps the small gain in power is worthwhile if the cost of a Type II error is large and the expense of obtaining controls is minimal, such as selecting controls with covariate information from a computerized database. If you must physically locate and recruit the controls, set up clinic appointments, run diagnostic tests, and enter data, the effort of pursuing a large number of controls quickly offsets any gain. You would use a one-to-one or two-to-one range. The bottom line is there is little additional power beyond a four-to-one ratio.

What if there is a Limited Number of Total Subjects for Case-Control Studies?

Sometimes the total number of subjects is limited (e.g., you have limited funds and the cost associated with each case is equal to the cost associated with a control). This graph illustrates power as related to the ratio of the controls to cases.

graph

from Woodward, M. Epidemiology Study Design and Analysis. Boca Raton: Chapman and Hall, 1999, p.358

Try it!

What is the ratio of cases/controls you should study for maximum power?

There is maximum power with a one-to-one ratio of controls to cases. If you are limited in the number of people that can be enrolled in a study, match cases to controls in a one-to-one fashion.

What about Matched Case-Control Studies?

In matched case/control study designs, useful data come from only the discordant pairs of subjects. Useful information does not come from the concordant pairs of subjects. Matching of cases and controls on a confounding factor (e.g., age, sex) may increase the efficiency of a case-control study, especially when the moderator's minimal number of controls are rejected.

The sample size for matched study designs may be greater or less than the sample size required for similar unmatched designs because only the pairs discordant on exposure are included in the analysis. The proportion of discordant pairs must be estimated to derive sample size and power. The power of matched case/control study design for a given sample size may be larger or smaller than the power for an unmatched design.

Formula for sample size calculation for matched case-control study:

\(n=\dfrac{(r+1)(1+(\lambda -1)P)^{2}}{rP^{2}(P-1)^{2}(\lambda -1)^{2}}\left [ z_{\alpha}\sqrt{(r+1)p_{c}^{*}} + z_{\beta}\sqrt{\frac{\lambda P(1-P)}{\left [ 1+(\lambda-1)P \right ]^{2}}+rP(1-P)} \right ]^{2}\)

Where:

\(p_{c}^{*}=\dfrac{P}{r+1}\left ( \dfrac{r\lambda}{1+(\lambda -1)P}+1 \right )\)

P = prevalence of exposure among the population
\(\lambda\) = estimated relative risk
r = ratio of cases to controls


9.6 - Example of a Cohort Study

9.6 - Example of a Cohort Study

In Week 3 of this course, you looked at the cohort study by Maurice Zeegers et al, "Alcohol Consumption and Bladder Cancer Risk: Results from the Netherlands Cohort Study" American Journal of Epidemiology, Vol 153, No. 1, pp 38-41. We discussed potential effect modifiers vs. confounders at that time. Let's look at this study again to in terms of its design as a cohort study: The study design for the original cohort and selection of the case-cohort is detailed in van den Brandt, P.A. et al. "A large scale prospective cohort study on diet and cancer in the Netherlands." J Clinical Epidemiology (1990), Vol 43, No. 3, 285-295.

Try it!

  1. What evidence is there of the prospective nature of this cohort study?
    Subjects completed a questionnaire on baseline risk factors; 61% also provided toenail clippings (exposure data); follow-up for incident cancer ensues with record linkage to cancer and pathology registries.
  2. Describe the type of cohort study...

    The original cohort came from the general population of 55 to 69-year-old men and women in the Netherlands, sampled from municipal population registries. Individuals with special dietary habits (e.g. vegetarians) were over-sampled. 120, 852 subjects are in the original cohort. These subjects completed the baseline questionnaire that was sent to 340, 439 subjects. A sub-cohort of 5000 was randomly selected immediately after the identification of cohort members. A case-cohort approach was used. There was a further random selection of 3500 members from the 5000 for processing questionnaires and toenail specimens; further selection for collecting and processing dietary questionnaires. See Figure 1 in Brandt et al.

  3. Why didn't the researchers use a nested case-control study?

    A nested case-control design would require waiting for cases to occur before efficiently matching controls to cases. This would cause a delay in processing questionnaires for cases and controls. The case-cohort approach allows data to be processed while cases are still being ascertained. In the case-cohort design, the person-year experience of the whole is estimated by the results of the sub-cohort, while cases are counted among the entire cohort.

A beauty of a well-run cohort study is the multiple outcomes that can be considered. A group well-characterized and followed over a long period of time provides much useful information. For example, the Framingham study has studied 3 generations and added to our understanding of the roles of obesity, HDL lipids, and hypertension in heart disease and stroke as well as contributing an algorithm for predicting CHD risk and identifying 8 genetic loci associated with hypertension. The use of sub-cohorts for specific purposes can minimize cost and the length of a study.


9.7 - Sample Size and Power for Epidemiologic Studies

9.7 - Sample Size and Power for Epidemiologic Studies

One reason for performing sample size calculations in the planning phase of a study is to assure confidence in the study results and conclusions. We certainly wish to propose a study that has a chance to be scientifically meaningful.

Are there other implications, beyond a lack of confidence in the results, to an inadequately-powered study? Suppose you are reviewing grants for a funding agency. If insufficient numbers of subjects are to be enrolled for the study to have a reasonable chance of finding a statistically significant difference, should the investigator receive funds from the granting agency? Of course not. The FDA, NIH, NCI, and most other funding agencies are concerned about sample size and power in the studies they support and do not consider funding studies that would waste limited resources.

Money is not the only limited resource. What about potential study subjects? Is it ethical to enroll subjects in a study with a small probability of producing clinically meaningful results, precluding their participation in a more adequately-powered study? What about the horizon of patients not yet treated? Are there ethical implications to conducting a study in which treatment and care actually help prolong life, yet due to inadequate power, the results are unable to alter clinical practice?

Too many subjects are also problematic. If more subjects are recruited than needed, the study is prolonged. Wouldn't it be preferable to quickly disseminate the results if the treatment is worthwhile instead of continuing a study beyond the point where a significant effect is clear? Or, if the treatment proves detrimental to some, how many subjects will it take for the investigator to conclude there is a clear safety issue?

Recognizing that careful consideration of statistical power and the sample size is critical to assuring scientifically meaningful results, protection of human subjects, and good stewardship of fiscal, tissue, physical, and staff resources, let's review how power and sample size are determined.

One-Sided Hypothesis Testing

  • Null hypothesis – \(H_0\colon \text{disease frequency}_1=\text{disease frequency}_2\)
  • Alternative hypothesis – \(H_1\colon\text{disease frequency}_1 \gt \text{disease frequency}_2\)

Power is calculated with regard to a particular set of hypotheses. Often epidemiologic hypotheses compare an observed proportion or rate to a hypothesized value. The above hypotheses are one-sided, i.e. testing whether the proportion is significantly less in group 2 than group 1. An example of two-sided hypotheses would be testing equality of proportions as the null hypothesis; using as the alternative, inequality of proportions.

Possible Outcomes for Tests of Hypotheses

When testing hypotheses, there are two types of error as shown in the table below:

Accept \(H_0\)   Reject \(H_0\)
\(H_0\) True Correct Decision Type I Error
(alpha; \(\alpha\))
\(H_0\) False Type II Error
(beta; \(\beta\))
Correct Decision

Using the analogy of a trial, we want to make correct decisions: declare the guilty, 'guilty' and the innocent, 'innocent'. We do not wish to declare the innocent 'guilty' or the guilty 'innocent'.

Statistical Power

Power is the probability that the null hypothesis is rejected if a specific alternative hypothesis is true. \(\beta\) represents Type II error, the probability of not rejecting the null hypothesis when the given alternative is true.

\(1-\beta\) = power

The power of a study should be minimally 80% and often, studies are designed to have 90-95% power to detect a particular clinical effect.

What factors affect power?

\(\alpha\),\(\beta\), effect size, variability, (baseline incidence), n

\(\alpha\) is the level of significance, the probability of a Type I error. This is usually 5% or 1%, meaning the investigator is willing to accept this level of risk of declaring the null hypothesis false when it is actually true.

The effect size is the deviation from the null that the investigator wishes to be able to detect. The effect size should be clinically meaningful. It may be based on the results of prior or pilot studies. For example, a study might be powered to be able to detect a relative risk of 2 or greater.

Sometimes a standardized effect size is given, i.e., the effect size divided by the standard deviation. This is a unitless value. If power is calculated in this manner, the standardized effect size is usually between 0.1 and 0.5, with 0.5 meaning \(H_1\) is 0.5 standard deviations away from \(H_0\).

Variability may be expressed in terms of a standard deviation, or an appropriate measure of variability for the statistic. If the hypotheses are concerned with a population proportion, the value of the proportion and the sample size are used to calculate the variability. The investigator will need an estimate of the variability in order to calculate power. Reasonable estimates may be obtained from historical data, pilot study data, or a literature search.

A study may have multiple sources of variation, each accounted for in the analysis. For example, a repeated measures design will need to account for both within-subject and between-subject variability.

The baseline incidence rate is related to the effect size. If it is hypothesized that a rate has increased or decreased, the baseline rate and the effect size must both be known to calculate the power for detecting such a change.

With knowledge of the above factors, the power of a statistical test can be calculated for a given sample size. Alternatively, the required sample size for a given power can be calculated.

Power is directly related to effect size, sample size, and significance level. An increase in either the effect size, the sample size, or the significance level will produce increased statistical power, all other factors being equal. Power is inversely related to variability. Decreasing variability will increase the power of a study.

If the power of a study is relatively high and a statistically significant effect is not observed, this implies the effect, if any, is small.

Sample Size in Epidemiologic Studies

Epidemiologic studies can be population-based or non-population-based, such as case-control studies.

  1. Population-based studies (cohort or cross-sectional studies)
    • Differences in proportions (e.g., attributable risk)
    • Ratios (e.g., relative risks, relative rates, prevalence ratios)
  2. Case-control studies (e.g., calculating an odds ratios)
    • Unmatched study designs
    • Multiple controls/case
    • Matched study designs

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility