Unit 1: Descriptive Epidemiology
Unit 1: Descriptive EpidemiologyUnit 1 Overview
Epidemiology affects our lives on a daily basis: from the way we make decisions related to our health and well-being on a personal level to the way health policy decisions are made by the government, public health agencies, and medical organizations. Over the years, epidemiology has helped identify disease outbreaks, provided surveillance on the state of public health, and established the association of many risk factors with adverse health outcomes.
We'll start with an introduction and examples of epidemiology accomplishments over the years. Next, it presents sources of public health surveillance data and describes how they can be used to make health policy decisions, as well as identify areas where further research, and possibly interventions, are needed. Finally, it ends with the standard measures of disease occurrence and frequency, and ways to use these measurements to compare populations.
Lesson 1 - Introduction to Epidemiology
Lesson 1 - Introduction to EpidemiologyLesson 1 Objectives
- Distinguish between epidemiology and clinical epidemiology
- Apply the terminology of the Epidemiologic Triad to an infectious disease
- Explore selected events in the history of epidemiology and population health
- State five objectives of epidemiologic research
- Compare Epidemiologic Study Designs in the Demonstration of Causality
- Differentiate between different types of populations
1.1 - Defining Terms
1.1 - Defining TermsMajor Definitions for the Study of Epidemiology
- Epidemiology
- The study of the distribution of disease and determinants of health-related states or events in specified human populations and the application of this study to the control of human health problems. (JM Last. Dictionary of Epidemiology. 2nd edition)
- Clinical Epidemiology
- The science of making predictions about individual patients by counting clinical events in similar patients, using strong scientific methods for studies of groups of patients to ensure that the predictions are accurate. (Fletcher, Fletcher, Wagner. Clinical Epidemiology. 1996)
What is the difference between these two views of epidemiology?
In the clinical setting, epidemiologic methods are used to predict a health outcome for an individual based on scientific studies of groups of similar patients. Clinical epidemiology is integral to evidence-based medicine. Epidemiology itself is the study of disease in a population to determine the frequency and distribution of the disease as well as risk factors for the disease. Although epidemiology is defined concerning human populations, epidemiologic principles can be extended to study other problems, such as colony collapse disorder in honeybees or improving herd health for a dairy farm.
General Dichotomies in Epidemiological Studies
When designing epidemiologic studies, choices must be made about the role of the investigator, the purpose of the study, the hypothesis regarding exposure, and the unit of analysis. Here are some examples:
Role of investigator:
- Observational – The investigator does not manipulate the exposure of participants to risk factors. Most epidemiological studies are observational
- Experimental - According to the study design, the investigator manipulates the exposure of participants to some factor. Clinical trials and intervention studies are examples of such experiments. If the study participants themselves act to change their exposure to an influence, a natural experiment may occur. For example, a study of persons who have migrated from one environment to another could constitute a natural experiment.
Purpose of the study:
- Descriptive - describes the distribution of disease by time, place, and person; used to generate hypotheses of disease causation or for health planning
- Analytic - measures and tests the association between a hypothesized risk factor and a disease
Hypothesized Effect of Exposure:
- Harmful - exposure increases the risk or presence of disease
- Beneficial - exposure reduces the risk or presence of disease
Unit of Analysis:
- Individual - the individual (e.g., person, animal) is the unit of analysis; there is potential to ignore the impact of the community or group effect on individual risk
- Community - the community (e.g., county, hospital) is the unit of analysis. There is potential for ecological fallacy in such studies. Lacking individual data, assuming that individuals perform similarly to the average of the group may not be true.
Data for a typical epidemiologic study may be summarized in a table comparing the numbers of cases (those with the disease or condition) to non-cases in terms of their exposure to a risk factor or beneficial agent. (2x2 Epidemiologic Table)
Case | Non-Case | Total | |
---|---|---|---|
Exposed | A | B | Texposed |
No Exposed | C | D | Tnon-exposed |
Total | Tcases | Tnon-cases |
1.2 - History of Epidemiology
1.2 - History of EpidemiologySelected History of Epidemiology and Population Health
Follow the links in the list below, and explore selected events in the history of epidemiology and population health.
1800s
- 1849-54 → John Snow formed and tested the hypothesis on the origin of cholera in London - one of the first studies in analytic epidemiology
1900s
- 1910s → Flu pandemic
- 1920 → Goldberger published a descriptive field study showing the dietary origin of pellagra
- 1940s → Fluoride supplements were added to public water supplies in randomized community trials
- 1949 → Initiation of the Framingham study of risk factors for cardiovascular disease
- 1950 → Epidemiological studies link cigarette smoking and lung cancer, demonstrating the power of case-control study design
- 1954 → Field trial of the Salk polio vaccine - the largest formal human experiment
- 1959 → Mantel and Haenszel develop a statistical procedure for stratified analysis of case-control studies
- 1960 → MacMahon published the first epidemiologic text with a systematic focus on study design
- 1964 → US Surgeon General's Report on Smoking and Health establishes criteria for evaluation of causality
- 1970s → Large community-based trials were implemented, such as Stanford Three Communities; worldwide eradication of smallpox
- 1980s → Chronic disease, injury, and occupational epidemiology; HIV epidemic
- 1990s → Behavioral risk factor epidemiology; prevention of adverse health outcomes through policies and regulations; national programs in breast and cervical cancer prevention; tobacco epidemiology; emerging infectious diseases; criticism of epidemiology for being inconsequential ('small' risk ratios); standardization of surveillance methods; Mad cow disease (BSE) in England and Europe; Variant Creutzfeld-Jacob disease; aging of USA; disaster epidemiology
2000s
- 2000s → Genetic and molecular epidemiology; health disparities; racialism; HIPAA in the USA; West Nile Virus;
- 2002 → bioterrorism; anthrax and smallpox threat and vaccinations
- 2003 → SARS, quarantines and public health law; and worldwide epidemiology; BSE in Canada
- 2004 → SARS recurrence; BSE in the USA; the flu epidemic
- 2009 → 2010 H1N1 pandemic
- 2020 → COVID-19 pandemic
1.3 - Objectives, Causality, Models
1.3 - Objectives, Causality, ModelsObjectives of Epidemiology
The objectives of epidemiology include the ability to:
- identify the etiology or cause of disease
- determine the extent of disease
- study the progression of the disease
- evaluate preventive and therapeutic measures for a disease or condition
- develop public health policy
Causality in Epidemiology
One objective of epidemiology is to identify the cause of a disease, with a desire to prevent or modify the severity of the condition. Consider the table below. Would you agree that this table accurately portrays the true causes of death in the U.S. population? Why or why not?
Cause | Estimated No.* | Percentage of Total Deaths |
---|---|---|
Tobacco | 400 000 | 19 |
Diet/ Activity Patterns | 300 000 | 14 |
Alcohol | 100 000 | 5 |
Microbial Agents | 90 000 | 4 |
Toxic Agents | 60 000 | 3 |
Firearms | 35 000 | 2 |
Sexual Behavior | 30 000 | 1 |
Motor Vehicles | 25 000 | 1 |
Illicit Use of Drugs | 20 000 | <1 |
Total | 1 060 000 | 50 |
*Composite approximation drawn from studies that use different approaches to derive estimates, ranging from actual counts (eg, firearms) to population attributable risk calculations (eg, tobacco). Numbers over 100,000 rounded to the nearest 100 000; over 50 000, rounded to the nearest 10,000; below 50,000, rounded to the nearest 5000.
Table: Estimated numbers by 'Cause' of Death(From McGinnis JM, Foege, WH. 1993 JAMA, 270(18): 2207-2212.)
As you may have noticed, the causes of death in Table 1 are all related to modifiable factors. The percentages do not total 100, but if these results are accurate, a large percentage of deaths can be postponed. The opportunity to prevent or ameliorate disease is an exciting component of epidemiologic study.
Epidemiologists follow pre-determined procedures in deciding whether to attribute a particular factor as a cause of a disease or condition. In the late 19th century, a German microbiologist, Robert Koch, devised a scheme for deciding whether or not a particular microbe caused a disease.
Infectious Disease Model
Koch's Postulates
One organism leads to one disease. (one-to-one)
- A specific organism must always be observed in association with the disease. (regular presence)
- The organism must be isolated from an infected host and grown in pure culture in the laboratory. (exclusive presence)
- When organisms from the pure culture are inoculated into a susceptible host organism, it must cause the disease. (sufficient cause)
- The infectious organism must be re-isolated from the diseased organism and grown in pure culture.
Do you see any problem with applying Koch's postulates to determine the cause of all diseases?
Consider asthma or lung cancer: can one micro-organism be isolated as causing the development of these conditions?
Modern Epidemiology
Modern epidemiology accommodates multiple exposures contributing to increased risk for one disease (many-to-one) and situations where one risk factor contributes to multiple diseases (one-to-many).
Considerations When Assessing Possible Causal Role of a Risk
Obviously, there are many factors to assess when considering whether a potential risk factor causes a disease or condition:
- How strong is the association? (odds ratio, relative risk)
- Is there a dose-response relationship?
- If exposure ceases, what happens? Does the condition change?
- Can the findings be replicated?
- Is there biological plausibility?
- Are there alternative explanations?
- How specific is the association?
- Is this consistent with other knowledge?
- Is there a statistical association? If so, is the association
- Spurious, due to chance or bias
- Non-causal OR Causal?
- Is a temporal relationship observed?
- Was the study design adequate?
Epidemiologic Triad
A traditional model of infectious disease causation, known as the Epidemiologic Triad is depicted in Figure 2. The triad consists of an external agent, a host, and an environment in which the host and agent are brought together, causing the disease to occur in the host. A vector, an organism that transmits infection by conveying the pathogen from one host to another without causing the disease itself, could be part of the infectious process.
A classic example of a vector is the Anopheles mosquito. As the mosquito ingests blood from an infected host, it picks up the parasite plasmodium. The plasmodium is harmless to the mosquito. However, after being stored in the salivary glands and then injected into the next human upon which the mosquito feeds, the plasmodium can cause malaria in the infected human. Thus, the Anopheles mosquito serves as a vector for malaria. Another familiar example of a vector is ticks of the genus Ixodes which can be vectors for Lyme disease.
In the traditional epidemiologic triad model, transmission occurs when the agent leaves its reservoir or host through a portal of exit and is conveyed by a mode of transmission to enter through an appropriate portal of entry to infect a susceptible host. Transmission may be direct (direct contact host-to-host, droplet spread from one host to another) or indirect (the transfer of an infectious agent from a reservoir to a susceptible host by suspended air particles, inanimate objects (vehicles or fomites), or animate intermediaries (vectors).
Can the epidemiologic triad be applied to a disease that is not infectious? Consider a smoking-related disease (Figure 3). If smoking (or more specifically, a carcinogen in the smoke of the cigarette) causes the disease, those who manufacture, sell and distribute cigarettes are vectors, bringing the disease-causing agent to the susceptible host. Diagramming the epidemiologic triad also indicates potential interventions to reduce disease in the population. In this example, clean indoor air legislation, advertising potential harm from smoking, or establishing workplace smoking cessation programs could change the environment and reduce the exposure of the host to the agent. Conversely, increased advertising from cigarette manufacturers or increased numbers of vendors would increase the exposure of the host to the agent.
Thus, the traditional model of disease transmission can be useful to identify areas of potential intervention to reduce disease prevalence, whether infectious or non-infectious.
1.4 - Epidemiologic Hypotheses, Designs, and Populations
1.4 - Epidemiologic Hypotheses, Designs, and PopulationsHypotheses
An epidemiologic hypothesis is a testable statement of a putative relationship between exposure and disease. The hypothesis should be:
- Clear
- Testable or resolvable
- State the relationship between exposure and disease
- Limited in scope
- Not inconsistent with known facts
- Supported by literature, theory, references
Designs
Hierarchy of Epidemiologic Study Designs in the Demonstration of Causality/Prevention
The design of a study contributes to the strength of its findings. Below are the types of studies, in order of increased strength for testing the relevant hypothesis. We will study some of these designs further later in this course.
Causation Hypothesis
- Case Study (describing one person with the condition, a case)
- Case Series (series of cases)
- Ecological Study (analysis of group statistics..for example, comparing rates of disease between two countries)
- Cross-Sectional Study (assessing individuals at one time, such as a survey)
- Case-Control Study (studying those with the condition vs. those without)
- Cohort Study (following subjects over time to study the initiation and progression of a condition)
Populations
Often, it is not feasible to conduct a study where we collect data from all affected individuals. Thus, we need to select a subset of those individuals for our study. The following defines populations and samples.
- Target Population
- Population to which inferences from the study are to be made.
A target population may be defined by geography, demography, health status, or some other factor.
- Study Population
- Population from which study subjects are selected
The sampling frame is the actual list that will be used to select the sample (ex. list of hospital admissions, household addresses, people with a certain disease or outcome)
Careful consideration should go into identifying the sampling frame for a study. If one cannot be created that mostly covers the population, bias can occur.
- Sample
- Subjects that provide data to the study
Data from the study participants are used to make estimates and draw conclusions about the population
Obviously, the method for selecting the sample can greatly influence the study results. In Lesson 2, we will learn more about methods to select samples.
1.5 - Lesson 1 Summary
1.5 - Lesson 1 SummaryLesson 1 Summary
This lesson laid the groundwork for the study of epidemiology by introducing key terms, providing an overview of the history of epidemiology, introducing the concept of causality, and presenting examples of hypotheses that can be evaluated using epidemiologic principles.
In Lesson 2, we’ll build upon these fundamentals to explore how epidemiology is used to inform public health practice.
Lesson 2 - Public Health Surveillance
Lesson 2 - Public Health SurveillanceLesson 2 Objectives:
- State at least 5 uses of disease surveillance information.
- Explore some public sources of disease surveillance data.
- Compare and contrast 5 health surveys conducted in the US with regard to the target population, sampling strategy, and purpose
- Identify advantages and disadvantages of surveys
- Differentiate between sampling strategies in order to select an appropriate sampling scheme for a survey
2.1 - Public Health Surveillance
2.1 - Public Health SurveillanceSurveillance: Information for Action
The Centers for Disease Control and Prevention have defined surveillance as follows:
Disease surveillance is the basic process by which epidemiologists answer questions about who, where, and when.
Who is getting the disease? Are there differences in the rates of disease by age? sex? race?
Where is the disease happening? Are there geographic areas with particularly high rates? extremely low rates?
Is the occurrence of the disease changing over time? Is the disease becoming more frequent? less frequent?
~Alexander D. Langmuir NEJM 1963;268;182-191.
Disease surveillance information is useful for:
- Estimating the magnitude of a problem
- Determining the geographic distribution of illness
- Portraying the natural history of a disease
- Detecting epidemics or defining a problem
- Generating hypotheses, stimulating research
- Evaluating control measures
- Monitoring changes in infectious agents
- Detecting changes in health practices
- Facilitating planning
Evaluation of Surveillance Systems
A disease surveillance system should be simple, flexible, and acceptable to the population. For example, to detect hunting-related shooting injuries, the requirements for a hunter to report an episode should not be onerous or many shooting injuries will go unrecorded. The surveillance system should also be representative of the population and provide a timely alarm. Like a smoke detector without a power source, a surveillance system that is not able to recognize a disease outbreak quickly and accurately is not very useful.
2.2 - Sources of Public Health Surveillance Data
2.2 - Sources of Public Health Surveillance DataSources of public health surveillance data can include:
- notifiable diseases,
- vital records (e.g. National Infant Mortality Surveillance, birth, death records),
- registry and survey data,
- administrative databases (such as Medicare or a prescription database), and
- some laboratory records.
Below are some websites to explore available sources of public health data:
Integrated Surveillance Information Systems/National Electronic Disease Surveillance System
In the U.S., this has been developed to standardize health reporting and link laboratory, hospital, and managed care data.
Pennsylvania’s Department of Health Vital Records
Enterprise Data Dissemination Informatics Exchange (EDDIE)
This is an interactive health statistics dissemination web tool where you can create customized data tables, charts and maps for various health related data.
CDC WONDER
This organization furthers the CDC's mission of health promotion and disease prevention by speeding and simplifying access to public health information for state and local health departments, the Public Health Service, and the academic public health community.
SEER
Surveillance, Epidemiology, and End Results Program of the National Cancer Institute
The SRTR Database
The Scientific Registry of Transplant Recipients
U.S. Fire Administration (USFA)
The USFA collects data from a variety of sources to provide information and analyses on the status and scope of the fire problem in the United States.
Health Surveys
In the US, governmental agencies conduct surveys for various purposes at regular intervals. Investigate these surveys by following the links below. [Select to expand]
Health Survey |
Target Population |
Mode/ Sampling Strategy/Size |
Health Issues; Example of a Disease/ Outcome and an Exposure |
---|---|---|---|
BRFSS | Non-institutionalized adult residents of the 50 states and the District of Columbia and selected territories |
Mode: telephone (cell and landline) survey Sampling strategy: Random-digit dial with post-stratification weighting Size: 400,000+ each year |
Health-related risk behaviors and events, chronic health conditions, use of preventive services, emerging health issues (e.g. vaccine shortage, influenza-like illnesses) Example: 2012, Number of adults age 50+ Who Had Blood Stool Test Screening within Last Two Years for Colorectal Cancer 38.8% Yes(CI 31.7-45.9, n=208) No 61.2% (CI 54.1-68.3, n=308) http://apps.nccd.cdc.gov/brfsssmart/MMSARiskChart.as p?yr=2012&MMSA=281&c at=CC&qkey=8521&grp=0 |
The Youth Risk Behavior Surveillance System (YRBSS) monitors six categories of health-related behaviors that contribute to the leading causes of death and disability among youth and adults, including—
- Behaviors that contribute to unintentional injuries and violence
- Sexual behaviors related to unintended pregnancy and sexually transmitted diseases, including HIV infection
- Alcohol and other drug use
- Tobacco use
- Unhealthy dietary behaviors
- Inadequate physical activity
YRBSS also measures the prevalence of obesity and asthma and other health-related behaviors plus sexual identity and sex of sexual contacts.
YRBSS is a system of surveys. It includes 1) a national school-based survey conducted by CDC and state, territorial, tribal, and 2) local surveys conducted by state, territorial, and local education and health agencies and tribal governments.
Health Survey |
Target Population |
Mode/ Sampling Strategy/Size |
Health Issues; Example of a Disease/ Outcome and an Exposure |
---|---|---|---|
YRBSS | Students in public and private high school (9th-12th grade) at the national, state, and local levels in the U.S. |
Mode: Anonymous, school-based questionnaire survey, administered in odd-numbered years. Sampling strategy: multi-stage cluster design. Size: In 2013, 13,000 youth in 42 states, 21 large urban schools, some tribal governments |
Behaviors that contribute to unintentional injuries and violence, Sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases, including HIV infection, Alcohol, and other drug use, Tobacco use, Unhealthy dietary behaviors, Inadequate physical activity, obesity, and asthma Example: among U.S. high school students surveyed in 2013146.8% had ever had sexual intercourse 34.0% had had sexual intercourse during the previous 3 months, and, of this 40.9% did not use a condom the last time they had sex 15.0% had had sex with four or more people during their life http://www.cdc.gov/healthyyouth/sexualbehaviors/index.htm |
- This website provides an Interactive Data Query System for accessing data from the NHIS: https://www.cdc.gov/nchs/nhis/shs.htm
Health Survey |
Target Population |
Mode/ Sampling Strategy/Size |
Health Issues; Example of a Disease/ Outcome and an Exposure |
---|---|---|---|
NHIS | Noninstitutionalized civilian adult population residing in the United States |
Mode: Cross-sectional household interview survey Sampling Strategy: stratified multistage household sample, with oversampling of elderly and minorities. Size: 30,000-40,000 households per year (75,000-100,000 persons) |
Monitor trends in illness and disability, track progress toward achieving national health objectives, epidemiologic and policy analysis of such timely issues as characterizing those with various health problems, determining barriers to accessing and using appropriate health care, and evaluating Federal health programs. Example: In 2012 the NHIS showed that 18% percent of U.S. adults were current smokers and 21% were former smokers http://www.cdc.gov/nchs/ data/series/sr_10/sr10_26 0.pdf |
The NHANES program began in the early 1960s and has been conducted as a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements to meet emerging needs. The survey examines a nationally representative sample of about 5,000 persons each year. These persons are located in counties across the country, 15 of which are visited each year.
The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.
Findings from this survey will be used to determine the prevalence of major diseases and risk factors for diseases. Information will be used to assess nutritional status and its association with health promotion and disease prevention. NHANES findings are also the basis for national standards for such measurements as height, weight, and blood pressure. Data from this survey will be used in epidemiological studies and health sciences research, which help develop sound public health policy, direct and design health programs and services, and expand the health knowledge for the Nation.
Health Survey |
Target Population |
Mode/ Sampling Strategy/Size |
Health Issues; Example of a Disease/ Outcome and an Exposure |
---|---|---|---|
NHANES | Non-institutionalized adults and children in U.S. |
Mode: Face-to-face interviews and clinical examinations. Sampling strategy: Multistage area probability sampling. Size: 5,000 persons each year. |
Determine the prevalence of major diseases and risk factors for diseases assess nutritional status and its association with health promotion and disease prevention. Establishes the basis for national standards for such measurements as height, weight, and blood pressure. Example: During 2007– 2012, 46.2% of adults aged 40–79 with lung obstruction currently smoked cigarettes. About 41% with mild and 55% with moderate or worse obstruction were current smokers. http://www.cdc.gov/nchs/ data/databriefs/db181.html |
Health Survey |
Target Population |
Mode/ Sampling Strategy/Size |
Health Issues; Example of a Disease/ Outcome and an Exposure |
---|---|---|---|
CHIS | Non-institutionalized civilian population of California (adults, teenagers and children) |
Mode: Telephone (cell phones and land-lines) survey Sampling strategy: random-digit-dial with a supplemental surname list frame for Korean and Vietnamese populations. Size: 42,000-50,000 |
From asthma, diabetes, and obesity to immigrant health and health insurance coverage, CHIS covers dozens of essential health topics. Example: Projections from the California Simulation of Insurance Markets (CalSIM) model indicate that up to half of Californians remaining uninsured will be undocumented immigrants who are not eligible under the Affordable Care Act (ACA). Most others lacking insurance will be eligible for Medi-Cal or subsidized insurance through Covered California but remain unenrolled due to difficulties with the enrollment process, inability to afford coverage, concerns about negative immigration-related consequences for themselves or their family members, or other barriers. Almost three/fourths of the remaining uninsured will be Latino, almost one-third will reside in Los Angeles County, and about 70 percent will CHIS is conducted by the UCLA Center for Health Policy Research in collaboration with the California Department of Public Health and the Department of Health Care Services be exempt from paying a tax penalty for lacking coverage. http://healthpolicy.ucla.edu /chis/research/Pages/default.aspx |
2.3 - Survey & Sampling Design
2.3 - Survey & Sampling DesignEven though this is not a course on survey design, a large source of public health data comes from surveys. As we saw earlier in the course, it is often not feasible to take measurements on the entire target population, so we must select a sample in which to gather data. This section introduces some advantages and disadvantages of using surveys and approaches to drawing a sample for an epidemiologic survey.
Survey Studies
An epidemiologic survey consists of a simultaneous assessment of the health outcome and exposures as well as potential confounders and effect modifiers. A survey given at a single time point can be part of a cross-sectional study. Some epidemiologists may call it a prevalence study. The survey results provide a 'snapshot' of a population. Surveys are a useful tool for gauging the health of a population or monitoring the effectiveness of a preventative intervention or provision of emergency relief.
While a survey may provide a relatively quick and inexpensive method for assessing the health of a population, there are both pros and cons, as noted below:
Advantages
- Inexpensive
- Relatively quick
- Can help establish or clarify a hypothesis
Disadvantages
- Exposure may not have preceded disease or outcome. This limits the assessment of causality. For example, a survey may ask about the current behavior of smoking and a diagnosis of asthma. While the results may show an association between smoking and asthma, we may not be able to accurately determine which came first.
- Disease and health outcomes with a long duration can be over-represented. Less severe outcomes may be under-represented because they may not have been diagnosed at the time of the survey.
- Surveys are subject to information bias (e.g. from inaccurate recall or misdiagnosis) and selection bias (e.g. those without a telephone cannot be selected for a random digit dial survey)
Survey Questions and Administration
Survey questions are carefully structured in order to reduce bias. Care should be given to the wording and order of questions. Using a standard questionnaire increases the reliability and validity of the results. A reliable survey has internal consistency and produces results that are replicable. The subject would answer the question in the same way if asked again. Valid questions are those which accurately assess the specific concept that is being measured.
The process of administering a survey should be standardized to reduce the potential for bias. The respondent should be informed of the purpose of the research and freely consent to participate. A survey with a low response rate is likely to have some bias.
STAT 507 is a course in epidemiologic research methods so we will not delve into the strengths and weaknesses of various methods for evaluating the reliability and validity of a survey instrument as might be presented in a psychometric course. You should however recognize the need to consider this type of analysis when selecting a survey instrument.
Sampling Designs
These methods of sampling can be applied to survey studies, as well as other observational and interventional studies.
Each of these approaches is useful, but to what population can the results be generalized?
2.4 - Lesson 2 Summary
2.4 - Lesson 2 SummaryLesson 2 Summary
Public health surveillance is important for the health of any nation. In order to decide how to allocate resources, it is vital to know who is being affected, where those people live, and the timeliness of the issue. There are many sources of public health data that can be used to achieve these goals including vital records, mandatory reporting, registries, and health surveys. Surveys are used to gather information that is not standardly or systematically collected. Since we often cannot gather data on the entire population of interest, we need to select a subgroup to sample from, and different methods for sampling were outlined in this lesson.
Lesson 3 - Measurements of Disease Occurrence and Frequency
Lesson 3 - Measurements of Disease Occurrence and FrequencyLesson 3 Objectives
- Select and use measures of disease frequency
- Define and calculate point prevalence, period prevalence, cumulative incidence, and incidence density rate
- Describe a potential outbreak with regard to person, place, and time.
- Construct and interpret an epi-curve to describe the course of an outbreak
3.1 - Disease Occurrence
3.1 - Disease OccurrenceOutcomes
Typical outcomes for an epidemiologic study, (sometimes referred to as the 'D's of Epidemiology) are as follows:
Outcomes of Epidemiology:
- Death Disease/Illness - Physical signs, laboratory abnormalities
- Discomfort - Symptoms (e.g., pain, nausea, dyspnea, itching, tinnitus)
- Disability - Impaired ability to do usual activities
- Dissatisfaction - Emotional reaction (e.g., sadness, anger)
- Destitution - Poverty, unemployment
Calculations
In order to describe and compare measures of disease occurrence, these are the types of calculations most often used:
Count
Definition: the number of individuals who meet the case definition
9188 cases of invasive colorectal cancer in Pennsylvania in 2005 (PA Cancer Registry data)
Proportion
Definition: A/(A+B); a fraction in which the numerator (A) includes only individuals who meet the case definition and the denominator (A+B) totals the numbers of individuals who meet the case definition (A) plus those in the study population who do not meet the case definition and are at risk (B).
30% of persons over 50 years of age have been screened for colon cancer
Ratio
Definition: A/B; a special fraction in which the numerator includes only individuals who meet one criterion (e.g. the case definition, A) and the denominator includes only individuals in the study population who meet another criterion (e.g. do not meet the case definition but are at risk, B). A ratio is not dependent upon time. If the ratio is a ratio of the number of individuals with the outcome to those without the outcome, the ratio is the odds. A ratio as a measure of disease frequency is used infrequently, in special situations. (not to be confused with an odds-ratio or risk-ratio)
- 1 case of colon cancer for every 1 case of breast cancer.
- 2 female cases of major depression to 1 male case of major depression.
Rate
Definition: a fraction in which the numerator includes only individuals who meet the case definition and the denominator includes individuals in the study population who do or do not meet the case definition but could meet the case definition (at-risk) and the total time at risk they contribute (person-time). Person-time is defined as the sum of time that each at-risk individual contributes to the study. If the study period is 2 years, person-time is as follows for certain groups:
- For participants who develop the disease
- time they spend on study before they developed the disease (< 2 years)
- These participants count in the numerator, and denominator
- For participants who drop out before 2 year period is over
- time they spend on study before they developed the disease (< 2 years
- These participants count only in the denominator
- For participants who do not develop the disease (in the 2 year window)
- 2 years
- These participants count only in the denominator
The sum of all these times would be the denominator.
0.1 case/person-years indicates that, on average, for every 10 person-years (i.e.: 10 people each followed 1 year or 2 people followed for 5 years, etc.) contributed, 1 new case of the health outcome will develop
Risk
Definition: A measure of the probability of an unaffected individual developing a specified health outcome over a given period of time. Risk is calculated by dividing the number of new cases by the total number of individuals at risk during the specified time period.
A 5-year risk of 0.10 indicates that an individual at risk has a 10% chance of developing the given health outcome in a 5-year period
3.2 - Disease Frequency: Incidence vs. Prevalence
3.2 - Disease Frequency: Incidence vs. PrevalenceThe two main ways by which the frequency of disease is measured are incidence and prevalence. These can be distinguished by differences in the time of disease onset.
- Incidence
- counts new cases of the disease (or outcome)
- Prevalence
- counts new and existing cases of the disease (or outcome)
Incidence
Incidence quantifies the development of disease. Incidence can be estimated using data from a disease registry data or a cohort trial. There is an implicit assumption of a period of time, such as new cases within a month (or a year).
A summary incidence rate can estimate the risk (e.g., probability of disease in an individual) if the risk is constant across the summarized groups.
As defined, incidence is a count of new cases. However, it is often expressed as a proportion of those at risk. The denominator includes all persons at risk for the disease or condition, i.e. disease-free or condition-free individuals in the population at the start of the time period. Persons in the denominator, those at-risk, should be able to appear in the numerator. Obviously, the denominator would not include persons who already have the disease or condition. Incidence can also be expressed in terms of person-time at risk.
Rates are usually expressed per 100, 1,000, or 100,000 persons. In a strict application, "rate" should only be used when the denominator is an estimate of the total person-time at risk. (You will find the term "rate" used inconsistently in epidemiologic reports. It is better to seek the source of the numbers than to rely on the nomenclature.)
Two Common Measures of Incidence
- Cumulative Incidence
- The cumulative incidence consists of the number of persons who newly experience the disease or studied outcome during a specified period of time divided by the total population at risk. This calculation assumes all persons in the denominator contribute an equal amount of time to the measure.
- Incidence Density Rate
- Incidence density rate (also known as incidence rate; person-time rate) is the number of persons who newly experience the outcome during a specified period of time divided by the sum of the time that each member of the population is at-risk.
Prevalence
Since prevalence counts both new and existing cases, the duration of the disease affects the prevalence. Diseases with a long duration will be more prevalent than those with a shorter duration. Chronic, non-fatal conditions are more prevalent than conditions with high mortality. The prevalence of disease is directly related to the duration of the disease. Prevalence is not an apt descriptor of an acute condition.
Similar to incidence, persons included in the denominator must have the potential for being in the numerator, i.e. at-risk for the disease or condition. Prevalence is often expressed after multiplication by 100 (%), 1000, or 100,000.
The prevalence pool is the subset of the population with the condition of interest. The prevalence pool is not generally useful for hypothesis-driven epidemiologic research because these are not new cases, but can be useful in tracking the natural history of the disease, evaluating effects of treatments, or disease burden.
For most etiologic research, incidence is the more appropriate measure. Studying the incidence of a rare condition, however, poses a challenge. Given a small number of new cases, it can be preferable to estimate prevalence instead of incidence in these situations. For example, birth defect rates reported as the number of cases/live births is a prevalent measure. Similarly, an autopsy rate is a prevalent measure.
Two common measures of prevalence
The difference is whether the estimate is made over a period of time or at one specific time as illustrated below:
- Point prevalence
- Prevalence of condition of interest at a specific time.
Number of existing cases on a specific date/ Number in the defined population on this date
Point prevalence ranges from 0 to 100. (%)
Point prevalence can be estimated from a cross-sectional survey or disease registry data by calculating the percentage with a particular disease or condition on a particular date. - E.g. what percentage had a particular type of flu on 1/17/2009?
- Period prevalence
- Prevalence of outcome of interest during a specified period of time.
Less frequently used.
Number of cases that occurred in a specified period of time/ Number in the defined population during this period
Period prevalence generally ranges from 0 to 100 %. (Theoretically, period prevalence can exceed 100% if you allow individuals who had the disease more than once to be counted for each case of the disease within the reporting period.)
E.g. What percentage of the population had an episode of flu between October and May within the most recent flu season?
3.3 - Outbreak Investigation
3.3 - Outbreak InvestigationInvestigating a Potential Outbreak
In this course, we have often assumed that investigators have knowledge of a potentially harmful exposure coincidentally with or prior to observing the disease or illness. In other situations, the first indication of harmful exposure is a report of a potential outbreak of disease or illness. Increased numbers of cases of disease or illness may necessitate an outbreak investigation. Questions to be answered in an outbreak investigation include the following:
Are there an unusual number of adverse health outcomes in this community?
If so, how many? Is the number increasing, decreasing, or stable?
What type of exposure may have caused the increase?
What is the anticipated future course and spread of this outbreak?
When an increase in the number of cases of a disease is reported, a speedy response is critical. At the same time, it is also of utmost importance to end up with an answer that will appropriately protect public health and safety. A systematic approach to outbreak investigation helps assure timely and accurate answers:
- Prepare for fieldwork
- Establish the existence of an outbreak
- Verify the diagnosis
- Define and identify cases
- Measure the frequency of adverse outcomes and describe the data in terms of time, place, and person
- Develop hypotheses
- Evaluate hypotheses
- Refine hypotheses and carry out additional studies
- Implement control and prevention measures
- Communicate findings
Orient in Terms of Time, Place, and Person
Characterizing by time: Constructing an Epi-Curve
An epidemic curve, frequently referred to as an 'epi-curve', is used to examine and characterize the occurrence of a possible outbreak. By constructing and examining an accurate epi-curve, an investigator can consider questions such as:
Is there an outbreak? If so, when did the outbreak begin?
Has the outbreak peaked? If so, when was the peak?
What might be the source of the exposure? Is there one source or multiple sources for exposure of cases? Is person-to-person transmission occurring?
Have the attempts to control the outbreak coincided with a decrease in the occurrence of the disease?
An epi-curve is a histogram with the number of cases of the adverse health outcome on the y-axis (ordinate) and dates of onset of the outcome on the x-axis (abscissa). Dates of onset may be grouped by days, weeks, or months, depending on the nature of the potential outbreak. A typical time period used is 1/4 to 1/3 the incubation period for the disease. If the incubation or lag time from exposure to outcome is unknown, it is valuable to experiment with different lengths of time.
A typical epi-curve is a simple chart with one series of data, the onset of cases. In other situations, several layers of data are displayed on the curve. For example, the investigator may want to examine the date of onset in more than one location (e.g. 2 or more cities, states or countries) or in different groups of people (e.g. stratified by age or race).
Another variation of the epi-curve is stacking the bars in order to show different characteristics of the cases. For example, you may decide to separate confirmed cases from suspect cases, using stacked bars to assess whether an outbreak is truly occurring.
Interpreting an Epi-Curve
The following shows the outbreak of COVID-19 cases in Pennsylvania:
The first consideration is the overall shape of the curve which is determined by the pattern of the outbreak (common source or person-to-person transmission). The shape also indicates the period of time over which susceptible people are exposed and the minimum, average and maximum incubation periods for the disease.
If the duration of exposure is prolonged, the epidemic is called a "continuous common source epidemic," and the epidemic curve will have a plateau instead of a peak. Person-to-person spread (a "propagated" epidemic) should have a series of progressively taller peaks one incubation period apart.
Cases that stand apart ("outliers") provide valuable information. An early case can represent a background (unrelated) case, a source of the epidemic, or a person who was exposed earlier than others. Similarly, late cases may be unrelated to the outbreak, may have especially long incubation periods, may indicate exposure later than most of the people affected, or maybe secondary cases (the person who becomes ill after being exposed to someone who was part of the initial outbreak). Examine any outliers that are part of the outbreak carefully because they may point directly to the source. For example, a prep chef could be the first case of strep in an epidemic among party-goers eating food prepared by this person.
In a point-source epidemic of a disease with a known incubation period, the epidemic curve can also identify the likely period of exposure.
Characterizing by place
A simple technique for looking at geographic patterns is to plot on a 'spot map' the locations where the affected people live, work, or may have been exposed. A map of cases in a community may show clusters or patterns that reflect water supplies, wind currents, or proximity to a restaurant or grocery store. A classic example is John Snow's detection of the Broad St. water pump as the source of a cholera epidemic. On a spot map of a hospital, nursing home, or another residential facility, clustering may indicate either a primary source or person-to-person spread. The scattering of cases throughout a facility is more consistent with a common source such as a dining hall.
If the size of the overall population varies among the areas being compared, the spot map with the number of cases can be misleading. Indicating the proportion affected or the attack rate for each area would be a better approach.
Characterizing by person
Define the populations at risk for the disease by characterizing an outbreak by personal characteristics such as age, race, sex, medical status, etc, and/or by exposures (e.g., occupation, leisure activities, use of medications, tobacco, drugs, etc.). Age and sex are characteristics often strongly related to exposure and risk; thus these factors are often assessed first. Other factors to be assessed are those possibly related to susceptibility to the disease and to opportunities for exposure to the disease being investigated and in the setting of the outbreak.
3.4 - Lesson 3 Summary
3.4 - Lesson 3 SummaryLesson 3 Summary
Lesson 3 was a big lesson! It introduced the main calculations for disease occurrence and frequency including counts, proportions, ratios, rates, and risks. An important component of epidemiologic measures is the concept of time, which is incorporated into rates and risks. The two main measures of disease frequency are incidence (new cases) and prevalence (new + old cases) which both are important for understanding the landscape of public health issues. Finally, we learned about outbreak investigations which are needed when a new public health concern arises - again, focusing on the who, when, and where to best understand the issue.
Lesson 4 - Comparing Groups In Terms of Disease Occurrence and Frequency
Lesson 4 - Comparing Groups In Terms of Disease Occurrence and FrequencyLesson 4 Objectives
- Organize disease frequency data into a 2 x 2 epidemiological table.
- Calculate and describe Risk Ratios and Odds Ratios to compare groups.
- Calculate and describe Risk Differences and Population Attributable Risk to compare groups.
- Recognize situations in which direct or indirect standardization should be considered.
- Given the required data, standardize a rate with a direct and indirect method.
4.1 - Example Research Hypotheses & Measurement Calculations
4.1 - Example Research Hypotheses & Measurement CalculationsResearch Hypotheses
Suppose our goal is to compare two populations with regard to disease or exposure-disease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location or time or exposure status. We'd like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.
There are many examples of hypotheses that are comparative in nature, such as:
- High salt intake increases the incidence of heart disease.
→Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.
- The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.
→Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.
- The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.
→Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.
- A high intake of Vitamin C reduces the prevalence of colds.
→Compare the prevalence of colds for persons with a high intake of Vitamin C compared to those with a low intake.
- Dizziness is associated more frequently with therapeutic agent A than with agent B.
→Compare the incidence of dizziness for patients receiving A with those receiving B.
- The administration of pre-surgical antibiotics decreases the rate of wound infections.
→Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.
Data Organization: 2x2 table
Consider this cohort study, An Association between Air Pollution and Mortality in Six U.S. cities, which investigated the relationship between air pollution and mortality in six US cities. The researchers were interested in the exposure of air pollution and the outcome of mortality.
Exposure refers to the characteristic of interest that the researcher hypothesizes may be associated with or causing a certain outcome. Often in epidemiological studies, the outcome of interest is a certain disease. Those who develop the disease are often referred to as cases, while those that do not are referred to as non-cases.
Data from a study that includes a risk factor (exposure) and indicators of the presence or absence of disease is often summarized as shown below:
Category | Case (Number) |
Non-Cases (Number) |
Total Exposure (Number) |
---|---|---|---|
Exposed | A | B | TotalExposed |
Not Exposed | C | D | TotalNotExposed |
Total | TotalCases | TotalNon-Cases | Total |
For the air pollution cohort study, the following tables can be constructed.
Category | Dead | Alive* | Total |
---|---|---|---|
High Pollution (Ohio) | 291 | 1060 | 1351 |
Low Pollution (Wisconsin) | 232 | 1399 | 1631 |
Total | 523 | 2459 | 2982 |
*this column was calculated by subtracting the number alive in Table 1 of the manuscript from the total number of participants. See...
Category | Dead | Alive* | Person-Years |
---|---|---|---|
High Pollution (Ohio) | 291 | 17914 | |
Low Pollution (Wisconsin) | 232 | 21618 | |
Total | 523 | 29532 |
*For incidence rate, the number of non-diseased (i.e. alive) participants is not necessary. Instead we need person-years for all people who experienced the outcome.
Measures of Disease Frequency
Disease Prevalence [by Exposure Status]
- For Exposed: A/(A+B)
- In our pollution example, this would be 291/1351= 0.215. Thus, 21.5% of the participants from the high pollution city (ie the exposed group) died.
- For Not Exposed: C/(C+D)
- In our pollution example, this would be 232/1631=0.142. Thus, 14.2% of the participants from the low pollution city (ie the non-exposed group) died.
Exposure Prevalence [by Disease Status]
- For Cases: A/(A+C)
- In our pollution example, this would be 291/523= 0.556. Thus, 55.6% of the participants who died were from the high pollution city.
- Fon Non-cases: B/(B+D)
- In our pollution example, this would be 1060/2459= 0.431. Thus, 43.1% of the participants who did not die were from the high pollution city.
Odds of Disease [by Exposure Status]
- For Exposed: A/B
- In our pollution example, this would be 291/1060. Thus, the odds of dying in the high pollution city are 291:1060 - which can be simplified to 1:3.64. (This value is hardly ever reported, but is needed to calculate the odds ratio, which will be presented later.)
- For Non-Exposed: C/D
- In our pollution example, this would be 232/1399. Thus, the odds of dying in the low pollution city are 232:1399 - which can be simplified to 1:6.03.
Odds of Exposure [by Disease Status]
- For Cases: A/C
- In our pollution example, this would be 291/232= 1.25:1. Thus, the odds of being from the high pollution city are 1.25:1 for those who died.
- For Non-cases: B/D
- In our pollution example, this would be 1060/1399= 0.76:1. Thus, the odds of being from the high pollution city are 0.76:1 for those who did not die.
There are two ways to compare measures between groups: ratios and differences. The next few sections will outline both methods and show examples using the air pollution study. Also, note that for these examples, estimates for the cumulative incidences and incidence rates are similar, but that is not always the case. In this study, person-years were similar across groups, resulting in similar estimates.
4.2 - Using Ratios to Compare Two Populations
4.2 - Using Ratios to Compare Two PopulationsA ratio may be used to convey the strength of an effect or association between two population groups or the relative 'risk' of the study (e.g. exposed) group compared to a comparison group (e.g. unexposed.). A ratio is not dependent on the prevalence of exposure among the study population.
- Ratio
- \(\dfrac{\text { Disease Frequency }(\text { Population } A)}{\text { Disease Frequency }(\text { Population } B)}\)
A ratio can be reported with upper and lower bounds. We will learn some formulas for these calculations in a later lesson. When there is no significant difference between groups, the ratio will equal 1 and/or include 1 in its confidence interval.
Ratio Calculations
Risk Ratios
Cumulative incidence ratio
- More generally can be thought of as:
- Ratio of Disease Incidence= [A/(A+B)] / [C/(C+D)]
- In our pollution example, this would be [291/1351] / [232/1631] = 1.51. Thus, participants from high-pollution cities are 1.51 times as likely as those from low-pollution cities to die. This makes sense since we saw that the cumulative incidence of death was about 21% in the high pollution city, and 14% in the low-pollution cities.
Incidence rate ratio
- Follows the same general formula, but instead of comparing incidences, we are comparing incidence rates. So first, we need to calculate the incidence rate in each city.
- High pollution city incidence rate of death = 291 deaths/ 17917 person-years, simplifies to 16.24 deaths per 1000 person-years
- Low pollution city incidence rate of death = 232 deaths/ 21618 person years, simplifies to 10.73 deaths per 1000 person years
- In our pollution example this would be (16.24/1000 person-years) / (10.72/1000 person-years) = 1.51
Odds Ratio
- Exposure Odds Ratio [A/C] / [B/D] = [A*D] / [B*C]
- Disease Odds Ratio [A/B] / [C/D] = [A*D] / [B*C]
- Both simplify to the same OR
- In our pollution example, this would be 291*1399 / 1060*232 = 1.66. Thus, participants from the high-pollution cities have 1.66 times higher odds of dying than those from low-pollution cities.
4.3 - Using Differences to Compare Two Populations
4.3 - Using Differences to Compare Two PopulationsAlternatively, differences can be calculated between the estimates for the two groups. The difference can be reported with a confidence interval that includes upper and lower bounds. If the confidence interval includes 0, this indicates that there is no significant difference between the groups. If the interval does not include 0, there is an increased risk for one population compared to the other (or conversely, a decreased risk). The difference can convey an excess or decreased risk among the exposed group due to exposure, possibly an excess or decreased risk that would be removed if the exposure ends, a potential reduction in risk for exposed individuals, or the absolute risk of the exposure.
Differences: Disease Frequency(Population A) - Disease Frequency(Population B)
Difference Calculations
Risk Difference
Cumulative incidence difference
- More generally can be thought of as:
- Difference of Disease Incidence= [A/(A+B)] - [C/(C+D)]
- In our pollution example, this would be [291/1351] - [232/1631] = 21.5% - 14.2% = 7.3%. Thus, participants from high-pollution cities have a 7.3% higher risk of death than participants from low-pollution cities.
Incidence rate difference
- In our pollution example, this would be (16.24/1000 person-years) - (10.72/1000 person-years) = 5.51/1000 person-years. Thus, there are 5.51 excess deaths per 1000 person-years among those in the high pollution city. Alternatively, the number of deaths could be reduced by 5.51 per 1000 person-years, if the pollution level in the high-pollution city was reduced to that of the low-pollution city.
Attributable Proportion among the Total Population (APt)
(Also known as population attributable risk (PAR))
The Attributable Proportion among the Total Population depends upon the prevalence of the exposure in the study population. This value is often used to convey implications for policy or regulations.
General Formula
- \(\mathrm{AP}_{\mathrm{t}}=\dfrac{\text { Risk(study population)-Risk(unexposed group) }}{\text { Risk(study population) }}\)
Incidence rate APt
- First, we need to calculate the incidence rate in the entire population. This would be the sum of all the deaths (1430) divided by the sum of all the person-years (111076) = 12.87 deaths per 1000 person-years.
- In our example, this would be [(12.87/1000 person-years - 10.73/1000 person-years)] / (12.87/1000 person-years) = 0.166. Thus 16.6% of the deaths in the population are attributable to the high pollution levels, and thus would be eliminated if the pollution levels were reduced.
4.4 - Standardization
4.4 - StandardizationWhen comparing groups, it is important to make sure we are making a fair comparison. Thus, it is helpful to standardize the rates, in order to remove the effect of a potential confounder (often age), which might differ between populations and could distort the results. Standardization is also helpful when comparing rates of one population over time, such as monitoring disease in a population over many years.
Disease can be measured in one population or compared between populations. Within one population, it is common to summarize disease burden with the number of cases. Another measure is the crude rate (i.e., x cases / y population at-risk), which you will also recognize as the cumulative incidence rate. If the distribution of a modifier of disease frequency (such as age) is different between two populations, however, a comparison of the crude rates in the two populations can mask the rate.
- A standardized rate is a measure of disease frequency that facilitates comparisons of populations with a different distribution of one or more potential confounding variables. (e.g., x cases / y population at-risk, adjusted to remove the effect of potential confounder [e.g., age]).
- Age-specific rates (i.e.., x cases in a specific age group/(population at-risk in same age group) are also useful in summarizing the health status of a population.
Types of Standardization
There are two different approaches to standardizing a rate:
Direct standardization
Direct standardization, more commonly used, creates a summary disease rate for a population that would be expected if the study population had a population distribution identical to that of an arbitrarily chosen standard population. A reference population is used as the standard population. The standardized rate is the sum of weighted group-specific rates, with weights derived from the standard population. The weights sum to 1.0. A standardized rate is essentially a weighted average of age-specific rates.
\(I_{W}=\dfrac{\sum W_{i} I_{i}}{\sum W_{i}}\)
where Ii is a group-specific rate and \(\sum W_{i}=1\)
The necessary data for direct standardization is the group-specific disease rates for the study population and the population distribution from the standard population.
Stop and Think!
Do you understand how direct adjustment is a weighted average of age-specific rates?
Indirect standardization
Indirect standardization also produces a weighted average, through the production of a summary disease rate for the study population which would be expected if the disease experience of the study population were identical to that of a standard population. The standard population is arbitrarily chosen, but should be as similar as possible to the study population Indirect adjustment is used when accurate group-specific rates for the study population are not available. If these rates are available, direct adjustment is preferred because it uses more information from the study population. Indirect adjustment produces an expected rate. Observed and expected rates are typically compared as a standardized ratio. Indirect adjustment is often used in occupational health to calculate standardized mortality ratios, which is dividing the observed death rate by the expected death rate.
The data required for indirect standardization is the crude rate for the study population; the population distribution for the study population and group-specific rates for the standard population.
Example
Consider the below data for which the researcher could not obtain the gender-specific rates.
From a standard population, it is known that the crude rate is 1.5/1000, the male rate is 2.2/1000, and the female rate is 0.9/1000.
If we also know that:
- Group 1 male 60%, female 40%
- Group 2 male 80%, female 20%
What is the expected crude rate for group 1?
It is (2.2 × 0.6) + (0.9 × 0.4) = 1.68 / 1000. The observed crude rate is 1.68.
Stop and Think!
Come up with an answer to this question by yourself and then click on the button below to reveal the solution.
What is the expected crude rate for group 2?
Answer: the expected crude rate is (2.2 * 0.8) + (0.9 * 0.2) = 1.94.
Try it! Comparison of Direct and Indirect Standardization Methods
Try the following true/false questions to test your knowledge.
-
Age-adjusted rates are measures of mortality risk, conveying the magnitude of a health problem.
False
-
Age-adjusted rates can be compared regardless of the standard population used.
False
Direct standardized rates are only comparable if the same standard population is used. For example, the US standard population of 1940 was considerably younger than the US standard population based on the 2000 census. This will affect the adjusted rates. Always pay attention to the reference population when comparing standardized rates. -
Direct age adjustment is an average of the observed age-specific rates, weighting each age-specific rate by the proportion of that same age group in a standard population.
True
-
Indirect adjustment is preferred if there are only a few cases across all age groups.
True
-
If I want to understand the magnitude of a health problem, the appropriate statistic is the number of events.
True
-
To explore the underlying risk in a population, the appropriate statistic is a crude rate with its confidence interval.
True
-
To compare populations on the basis of differences in risk, after controlling for age, the appropriate statistic is the crude relative risk.
False
The age-adjusted rates and confidence intervals or relative risk (rate ratio) adjusted for age.
4.5 - Lesson 4 Summary
4.5 - Lesson 4 SummaryLesson 4 Summary
Describing the state of public health is an important component of epidemiology, which was presented in Lesson 3. The next step of comparing outcomes between groups was introduced in Lesson 4. We saw some examples of situations when the goal is to compare groups, and learned the two main ways to do so: absolute (differences) and relative (ratios). An additional point to consider is that the groups may differ on a characteristic that affects the measure, most often age, so standardization is needed in order to make fair comparisons.