6.1 - Rationale & Design

Cohort studies are useful for estimating disease risk, incidence rates, and/or relative risks. Non-cases may be enrolled from a well-defined population, their current exposure status (at t0) determined, and their disease onset observed over time. Disease status at t1 can be compared to exposure status at t0.

There are two main types of cohort studies: prospective and retrospective.  In general, the descriptor, 'prospective' or 'retrospective', indicates when the cohort is identified relative to the initiation of the study.
Sometimes investigators enter into an ongoing prospective cohort study before the cases are determined. An investigator may pose a new question during an intermediate time period of an ongoing study. The cohort is already determined.

Exposure and case status information is available from the beginning of the ongoing study; subjects can be followed forward to collect cases. For example, suppose genotyping is performed in a cohort study. Later, an investigator may decide to use these data and subsequent case status to consider the relationship of a genetic factor to a particular disease. The investigator may wish to follow the subjects for several more years to ascertain more cases. This would be a mixture of a prospective and retrospective cohort.

Prospective Cohort Design (concurrent; longitudinal study) Section

In a prospective cohort study, the investigators identify the study population at the beginning of the study and accompanies the subjects through time. When proposing a prospective cohort study, the investigator first identifies the characteristics of the group of people he/she wishes to study. The investigator then determines the present case status of individuals, selecting only non-cases to follow forward in time. Exposure status is determined at the beginning of the study. A member of the cohort reaches the endpoint either by dying, becoming a case, or reaching the end of the study period. A subject can also be lost to follow-up over the course of the study.


  • loss to follow up;
  • differential nonresponse;
  • loss of funding support;
  • continually improving methods for detecting exposure (leading to greater misclassification than would be expected in current practice)

An advantage of a well-run cohort study is the multiple outcomes that can be considered. A group well-characterized and followed over a long period of time provides much useful information. For example, the Framingham study has studied 3 generations and added to our understanding of the roles of obesity, HDL lipids, and hypertension in heart disease and stroke as well as contributing an algorithm for predicting CHD risk and identifying 8 genetic loci associated with hypertension. The use of sub-cohorts for specific purposes can minimize cost and the length of a study.

Retrospective Cohort Study (Historical cohort; Non-concurrent Prospective Cohort) Section

An investigator accesses a historical roster of all exposed and non-exposed persons and then determines their current case/non-case status. The investigator initiates the study when the disease is already established in the cohort of individuals, long after the original measurement of exposure. Doing a retrospective cohort study requires good data on exposure status for both cases and non-cases at a designated earlier time point.

  Stop and Think!

How does a retrospective cohort study differ from a case-control study?

Both types of studies identify present cases and non-cases.

The case-control study identifies the cases and then selects appropriate controls. An entire cohort is not used. If you were investigating an environmentally-related cancer among university students with a case-control study, you would identify students within certain years who met the case definition for the cancer. You would select controls among students who were not a case of cancer, but matched on characteristics such as age, gender, and graduation year, then determine their exposure status (perhaps the proximity of their campus address to the identified toxin) and compare exposures between cases and non-cases.

A retrospective cohort study uses the entire cohort; all cases and non-cases within the identified group. A retrospective cohort design might designate the cohort to be students enrolled at the university over a 5 year time span. The present case status of all these students is determined and historical data about their exposure status accessed, in order to assess the relationship between being a case of the cancer with the exposure.

Examples of Cohort Studies Section

Examples of large prospective cohort studies that have been ongoing for an extended period of time are provided in this section.
Research is still being conducted with these established cohorts.

The Framingham study began in 1948 by recruiting an Original Cohort of 5,209 men and women between the ages of 30 and 62 from the town of Framingham, Massachusetts, who had not yet developed overt symptoms of cardiovascular disease or suffered a heart attack or stroke. Since that time the Study has added an Offspring Cohort in 1971, the Omni Cohort in 1994, a Third Generation Cohort in 2002, a New Offspring Spouse Cohort in 2003, and a Second Generation Omni Cohort in 2003.
The Nurses' Health Study (NHS) and the Nurses' Health Study II (NHS II) are among the largest prospective investigations into the risk factors for major chronic diseases in women. The primary motivation for the study was to investigate the potential long-term consequences of oral contraceptives, which were being prescribed to hundreds of millions of women. Married registered nurses, aged 30 to 55 in 1976, who lived in the 11 most populous states, and whose nursing boards agreed to supply NHS with their members' names and addresses, were eligible to be enrolled in the cohort if they responded to the NHS baseline questionnaire In addition to reading about the history of the studies, watch a series of short videos about the studies on the Nurses’ Health Study 3 YouTube page.
The NHANES I Epidemiologic Follow-up Study (NHEFS) is a national longitudinal study that was jointly initiated by the National Center for Health Statistics and the National Institute on Aging in collaboration with other agencies of the Public Health Service. The NHEFS was designed to investigate the relationships between clinical, nutritional, and behavioral factors assessed in the first National Health and Nutrition Examination Survey NHANES I and subsequent morbidity, mortality, and hospital utilization, as well as changes in risk factors, functional limitation, and institutionalization. The NHEFS cohort includes all persons 25-74 years of age who completed a medical examination at NHANES I in 1971-75 (n = 14,407). It consists of a series of follow-up studies, four of which have been conducted to date. The first wave of data collection was conducted for all members of the NHEFS cohort from 1982 through 1984. It included tracing the cohort; conducting personal interviews with subjects or their proxies; measuring pulse rate, weight, and blood pressure of surviving participants; collecting hospital and nursing home records of overnight stays; and collecting death certificates of decedents.