Lesson 6 - Cohort Studies

Lesson 6 - Cohort Studies

Lesson 6 Objectives

Upon completion of this lesson, you should be able to:

  • Distinguish between prospective and retrospective cohort studies
  • Identify advantages and disadvantages of large continuously running prospective cohort studies
  • Calculate cumulative incidence, person-time, and incidence rates
  • Compare cohort and case-control studies
  • Describe combination studies including nested case-control and case-cohort designs

 


6.1 - Rationale & Design

6.1 - Rationale & Design

Cohort studies are useful for estimating disease risk, incidence rates, and/or relative risks. Non-cases may be enrolled from a well-defined population, their current exposure status (at t0) determined, and their disease onset observed over time. Disease status at t1 can be compared to exposure status at t0.

There are two main types of cohort studies: prospective and retrospective.  In general, the descriptor, 'prospective' or 'retrospective', indicates when the cohort is identified relative to the initiation of the study.
Sometimes investigators enter into an ongoing prospective cohort study before the cases are determined. An investigator may pose a new question during an intermediate time period of an ongoing study. The cohort is already determined.

Exposure and case status information is available from the beginning of the ongoing study; subjects can be followed forward to collect cases. For example, suppose genotyping is performed in a cohort study. Later, an investigator may decide to use these data and subsequent case status to consider the relationship of a genetic factor to a particular disease. The investigator may wish to follow the subjects for several more years to ascertain more cases. This would be a mixture of a prospective and retrospective cohort.

Prospective Cohort Design (concurrent; longitudinal study)

In a prospective cohort study, the investigators identify the study population at the beginning of the study and accompanies the subjects through time. When proposing a prospective cohort study, the investigator first identifies the characteristics of the group of people he/she wishes to study. The investigator then determines the present case status of individuals, selecting only non-cases to follow forward in time. Exposure status is determined at the beginning of the study. A member of the cohort reaches the endpoint either by dying, becoming a case, or reaching the end of the study period. A subject can also be lost to follow-up over the course of the study.

Challenges:

  • loss to follow up;
  • differential nonresponse;
  • loss of funding support;
  • continually improving methods for detecting exposure (leading to greater misclassification than would be expected in current practice)

An advantage of a well-run cohort study is the multiple outcomes that can be considered. A group well-characterized and followed over a long period of time provides much useful information. For example, the Framingham study has studied 3 generations and added to our understanding of the roles of obesity, HDL lipids, and hypertension in heart disease and stroke as well as contributing an algorithm for predicting CHD risk and identifying 8 genetic loci associated with hypertension. The use of sub-cohorts for specific purposes can minimize cost and the length of a study.

Retrospective Cohort Study (Historical cohort; Non-concurrent Prospective Cohort)

An investigator accesses a historical roster of all exposed and non-exposed persons and then determines their current case/non-case status. The investigator initiates the study when the disease is already established in the cohort of individuals, long after the original measurement of exposure. Doing a retrospective cohort study requires good data on exposure status for both cases and non-cases at a designated earlier time point.

  Stop and Think!

How does a retrospective cohort study differ from a case-control study?

Both types of studies identify present cases and non-cases.

The case-control study identifies the cases and then selects appropriate controls. An entire cohort is not used. If you were investigating an environmentally-related cancer among university students with a case-control study, you would identify students within certain years who met the case definition for the cancer. You would select controls among students who were not a case of cancer, but matched on characteristics such as age, gender, and graduation year, then determine their exposure status (perhaps the proximity of their campus address to the identified toxin) and compare exposures between cases and non-cases.

A retrospective cohort study uses the entire cohort; all cases and non-cases within the identified group. A retrospective cohort design might designate the cohort to be students enrolled at the university over a 5 year time span. The present case status of all these students is determined and historical data about their exposure status accessed, in order to assess the relationship between being a case of the cancer with the exposure.

Examples of Cohort Studies

Examples of large prospective cohort studies that have been ongoing for an extended period of time are provided in this section.
Research is still being conducted with these established cohorts.

The Framingham study began in 1948 by recruiting an Original Cohort of 5,209 men and women between the ages of 30 and 62 from the town of Framingham, Massachusetts, who had not yet developed overt symptoms of cardiovascular disease or suffered a heart attack or stroke. Since that time the Study has added an Offspring Cohort in 1971, the Omni Cohort in 1994, a Third Generation Cohort in 2002, a New Offspring Spouse Cohort in 2003, and a Second Generation Omni Cohort in 2003.
The Nurses' Health Study (NHS) and the Nurses' Health Study II (NHS II) are among the largest prospective investigations into the risk factors for major chronic diseases in women. The primary motivation for the study was to investigate the potential long-term consequences of oral contraceptives, which were being prescribed to hundreds of millions of women. Married registered nurses, aged 30 to 55 in 1976, who lived in the 11 most populous states, and whose nursing boards agreed to supply NHS with their members' names and addresses, were eligible to be enrolled in the cohort if they responded to the NHS baseline questionnaire In addition to reading about the history of the studies, watch a series of short videos about the studies on the Nurses’ Health Study 3 YouTube page.
The NHANES I Epidemiologic Follow-up Study (NHEFS) is a national longitudinal study that was jointly initiated by the National Center for Health Statistics and the National Institute on Aging in collaboration with other agencies of the Public Health Service. The NHEFS was designed to investigate the relationships between clinical, nutritional, and behavioral factors assessed in the first National Health and Nutrition Examination Survey NHANES I and subsequent morbidity, mortality, and hospital utilization, as well as changes in risk factors, functional limitation, and institutionalization. The NHEFS cohort includes all persons 25-74 years of age who completed a medical examination at NHANES I in 1971-75 (n = 14,407). It consists of a series of follow-up studies, four of which have been conducted to date. The first wave of data collection was conducted for all members of the NHEFS cohort from 1982 through 1984. It included tracing the cohort; conducting personal interviews with subjects or their proxies; measuring pulse rate, weight, and blood pressure of surviving participants; collecting hospital and nursing home records of overnight stays; and collecting death certificates of decedents.

6.2 - Analysis

6.2 - Analysis

Cohort studies often aim to estimate disease occurrence by cumulative incidence or incidence rates.

An important component of calculating the incidence rate is the calculation of person-time. For each person in the study, the time they contribute is the time from study enrollment until becoming a case, or the time until study completion or dropping out of the study.  

The incidence rate is the number of persons who newly experience the outcome during a specified period of time divided by the sum of the time that each member of the population is at risk.

Since all people in the cohort study are non-cases at the start, if there is a specific exposure of interest, the relative risk of becoming a case can be calculated for the exposed versus non-exposed.  More details regarding analysis methods will be seen later in the course. 


6.3 - Comparing & Combining Case-Control and Cohort Studies

6.3 - Comparing & Combining Case-Control and Cohort Studies

Comparison of Cohort and Case-control Studies

Comparative Term Cohort Study Case-Control Study
Estimates Can calculate incidence rate, risk, and relative risk Only estimates odds ratio
Causality Potentially greater strength for causal investigations Potentially weaker causal investigation
Cost Expensive Less expensive
Time to complete Long-term study Short-term study
Sample size A large sample size often required, especially for rare outcomes Can be powered with a small sample of cases
Efficient designs Efficient design for rare exposure & multiple outcomes Efficient design for rare diseases & multiple exposures
Recall bias Less potential for recall bias More potential for recall bias
Loss to followup More potential for loss to followup Less potential for loss to followup
The natural course of the disease Yes No

Nested Case-Control Study Design

This is a case-control study within a cohort study. At the beginning of the cohort study, (t0), members of the cohort are assessed for risk factors. Cases and controls are identified subsequently at time t1. The control group is selected from the risk set (cohort members who do not meet the case definition at t1.) Typically, the nested case-control study is less than 20% of the parent cohort.

Advantages of nested case-control

  • Efficient – not all members of the parent cohort require diagnostic testing
  • Flexible – allows testing of hypotheses not anticipated when the cohort was drawn (at t0)
  • Reduces selection bias – cases and controls sampled from the same population
  • Reduces information bias – risk factor exposure can be assessed with an investigator blind to case status

Disadvantages

  • Reduces power (from parent cohort) because of reduced sample size by 1/(c+1), where c = number of controls per case

Nested case-control studies can be matched, not matched, or counter-matched. Matching cases to controls according to baseline measurements of one or several confounding variables is done to control for the effect of confounding variables.

A counter-matched study, in contrast, is when we matched cases to controls who have a different baseline risk factor exposure level. The counter-matched study design is used to specifically assess the impact of this risk factor; it is especially good for assessing the potential interaction (effect modification!) of the secondary risk factor and the primary risk factor. Counter-matched controls are randomly selected from different strata of risk factor exposure levels in order to maximize variation in risk exposures among the controls. For example, in a study of the risk for bladder cancer from alcohol consumption, you might match cases to controls who smoke different amounts to see if the effect of smoking is only evident at a minimum level of exposure.

Example

Example of a Nested Case-Control Study: Familial, psychiatric, and socioeconomic risk factors for suicide in young people: a nested case-control study. In a cohort study of risk factors for suicide, Agerbo et al. (2002), enrolled 496 young people who had committed suicide during 1981-97 in Denmark matched for sex, age, and time to 24,800 controls. Read how they matched each case to a representative random subsample of 50 people born the same year!

Case-Cohort Study Design

A case-cohort study is similar to a nested case-control study in that the cases and non-cases are within a parent cohort; cases and non-cases are identified at time t1, after baseline. In a case-cohort study, the cohort members were assessed for risk factors at any time prior to t1. Non-cases are randomly selected from the parent cohort, forming a subcohort. No matching is performed.

Advantages of Case-Cohort Study:

Similar to nested case-control study design:

  • Efficient– not all members of the parent cohort require diagnostic testing
  • Flexible– allows testing hypotheses not anticipated when the cohort was drawn (t0)
  • Reduces selection bias – cases and non-cases sampled from the same population
  • Reduced information bias – risk factor exposure can be assessed with an investigator blind to case status

Other advantages, as compared to nested case-control study design:

  • The subcohort can be used to study multiple outcomes
  • Risk can be measured at any time up to t1  (e.g. elapsed time from a variable event, such as menopause, or birth)
  • Subcohort can be used to calculate person-time risk

Disadvantages of Case-Cohort Study:

As compared to nested case-control study design  –  Increased potential for information bias because subcohort may have been established after t0 exposure information collected at different times (e.g. potential for sample deterioration)

 


6.4 - Lesson 6 Summary

6.4 - Lesson 6 Summary

Cohort studies are the second main type of observational study design in epidemiology.  This design is desirable when there are many potential outcomes you want to investigate, when the goal is a direct measure of incidence or risk, and for rare exposures.  Cohort studies can be prospective, where the investigators assemble the cohort and then follow them to observe outcomes, or retrospective when the investigators identify the cohort based on past exposures and evaluate outcomes that have already occurred. Prospective cohort studies are less vulnerable to bias and can evaluate the temporal relationship between exposure and outcome.  Retrospective studies are good for diseases with long induction and latent periods.  Some weaknesses of cohort studies are that they are inefficient for rare outcomes, can be very expensive and time-consuming, and retrospective cohort studies can be more prone to bias. There are many examples of large and small cohort studies, some of which have been going on for decades, and there are still many questions that can be answered from these studies! 


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility