Lesson 9: Cohort Study Design; Sample Size and Power Considerations for Epidemiologic Studies

In this week's lesson, we will cover the design of a cohort study. We will also review sample size and power considerations as applied to epidemiologic studies.

Let's get started!

Cohort Study Design

A cohort study is useful for estimating the risk of disease, the incidence rate and/or relative risks. Non-cases may be enrolled from a well-defined population, current exposure status (at \(t_0\)) determined, and the onset of disease observed in the subjects over time. Disease status at \(t_1\) can be compared to exposure status at \(t_0\). The data may be displayed as follows:

  Case

(Number)

Non-cases

(Unnecessary if Totl known)

Total Exposure

(Person*Time)

Exposed A B \(\text{Total}_{\text{Exposed}}\)
Not-exposed C D \(\text{Total}_{\text{Non-exposed}}\)
  \(\text{Total}_{\text{cases}}\) \(\text{Total}_{\text{non-cases}}\) Total

As you have learned, measures of disease frequency and effect or association can be calculated from these data:

  1. Incidence Density (Incidence Rate):

    Among Exposed: \(\dfrac{A}{T_{\text{Exposed}}}\)

    Among Nonexposed: \(\dfrac{C}{T_{\text{Nonexposed}}}\)

  2. Incidence Density Ratio (Risk Ratios; Relative Risk)

    \(\dfrac{(\frac{A}{T_{\text{Exposed}}})}{(\frac{C}{T_{\text{Nonexposed}}})}\)

  3. Attributable Risk

    \((\frac{A}{T_{\text{Exposed}}}) - (\frac{C}{T_{\text{Nonexposed}}})\)

Types of Cohort Studies

The simplest cohort design is prospective, i.e., following a group forward in time, but a cohort study can also be 'retrospective'. In general, the descriptor, 'prospective' or 'retrospective', indicates when the cohort is identified relative to the initiation of the study.

  1. Prospective cohort (concurrent; longitudinal study) - An investigator identifies the study population at the beginning of the study and accompanies the subjects through time. In a prospective study, the investigator begins the study at the same time as the first determination of exposure status of the cohort. When proposing a prospective cohort study, the investigator first identifies the characteristics of the group of people he/she wishes to study. The investigator then determines the present case status of individuals, selecting only non-cases to follow forward in time. Exposure status is determined at the beginning of the study.

    • Problems: loss to follow up; differential nonresponse; loss of funding support; continually improving methods for detecting exposure (leading to greater misclassification than would be expected in current practice)
    • Examples: Framingham Study; Nurses Health Study; National Health and Nutrition Examination Study Follow up Study. These are all studies where case status was determined at the beginning of the cohort and cases eliminated from the study. Exposure was then measured who were followed over a period of time until reaching the study endpoint. A member of the cohort reaches the endpoint either by dying, becoming a case, or reaching the end of the study period. A subject can also be lost to follow-up over the course of the study. The investigator progresses through time with the subjects in a prospective cohort study. Such a study may also be called a longitudinal or a concurrent study, as opposed to a retrospective cohort study.
  2. Retrospective cohort study (historical cohort; non-concurrent prospective cohort) - An investigator accesses a historical roster of all exposed and nonexposed persons and then determines their current case/non-case status. The investigator initiates the study when the disease is already established in the cohort of individuals, long after the original measurement of exposure. Doing a retrospective cohort study requires good data on exposure status for both cases and noncases at a designated earlier timepoint.

    Try it!

    How does a retrospective cohort study differ from a case-control study? Suppose you are investigating the possibility of an environmentally-linked cancer among students at a university. How would the sample selected for a case-control study differ from those included in a retrospective cohort study?

    Both types of studies identify present cases and non-cases.

    The case-control study identifies the cases and then selects appropriate controls. An entire cohort is not used. If you were investigating an environmentally-related cancer among university students with a case-control study, you would identify students within certain years who met the case definition for the cancer. You would select controls among students who were not a case of cancer, but matched on characteristics such as age, gender and graduation year, then determine their exposure status (perhaps proximity of their campus address to the identified toxin) and compare exposures between cases and non-cases.

    A retrospective cohort study uses the entire cohort; all cases and non-cases within the identified group. A retrospective cohort design might designate the cohort to be students enrolled at the university over a 5 year time span. The present case status of all these students is determined and historical data about their exposure status accessed, in order to assess the relationship between being a case of the cancer with the exposure.

    Potential problems with the retrospective cohort approach include selection bias and misclassification bias because of the retrospective nature of the study. However, the retrospective cohort design can be useful when reliable records are available, such as in occupational studies where levels of exposure to environmental exposures are monitored and recorded in a database. Investigators can determine the case status of the entire group at the present time, then use the exposure records to assess the relationship between exposure and disease. To make the terminology even more confusing, these types of designs can also be labeled historical cohort studies or non-concurrent prospective cohort studies.

    Try it!

    Is a panel study the same as a cohort study??

    A panel study is not the same as an epidemiologic cohort study which identifies a group of people, measures their exposure status (which might help determine who is in the group), assures that subjects do not have the stated outcome of interest, and then follows them over time (possibly with measures at different points in time) to see if they develop the outcome. The cohort is selected based upon exposure and absence of the outcome and followed over time to specified outcome.

    A panel study is a longitudinal study of a cohort of people with multiple measures over time. They are a cohort because they share something in common (e.g, employment, retirement). There is generally limited sampling with respect to exposure, and there is no assurance of not being diseased (or having a specific outcome) when they enter the study. A disease or outcome of interest is not specified... The panel study is a group of people who share a characteristic and they are progressing through time together to undetermined outcomes.

  3. Investigator enters well-after cohort is enrolled, but well-before case determination

    Sometimes investigators enter into an ongoing prospective cohort study before the cases are determined. An investigator may pose a new question during an intermediate time period of an ongoing study. The cohort is already determined. Exposure and case status information are available from the beginning of the ongoing study; subjects can be followed forward to collect cases. For example, suppose genotyping is performed in a cohort study. Later, an investigator may decide to use these data and subsequent case status to consider the relationship of a genetic factor to a particular disease. The investigator may wish to follow the subjects several more years to ascertain more cases. This would be a mixture of a prospective and retrospective cohort.

Objectives

Upon completion of this lesson, you should be able to:

  • distinguish between a cohort study, case-control study, case-control nested within a cohort and a case-cohort study and discuss the relative advantages and disadvantages of each design,
  • describe the relationships between sample size, power, variability, effect size, and significance level, and
  • calculate sample size, given the necessary background information.