Unit 1: Descriptive Epidemiology

Unit 1: Descriptive Epidemiology

Unit 1 Overview

Epidemiology affects our lives on a daily basis: from the way we make decisions related to our health and well-being on a personal level to the way health policy decisions are made by the government, public health agencies, and medical organizations. Over the years, epidemiology has helped identify disease outbreaks, provided surveillance on the state of public health, and established the association of many risk factors with adverse health outcomes.    

We'll start with an introduction and examples of epidemiology accomplishments over the years. Next, it presents sources of public health surveillance data and describes how they can be used to make health policy decisions, as well as identify areas where further research, and possibly interventions, are needed. Finally, it ends with the standard measures of disease occurrence and frequency, and ways to use these measurements to compare populations.


Lesson 1 - Introduction to Epidemiology

Lesson 1 - Introduction to Epidemiology

Lesson 1 Objectives

Upon completion of this lesson, you should be able to:

  • Distinguish between epidemiology and clinical epidemiology
  • Apply the terminology of the Epidemiologic Triad to an infectious disease
  • Explore selected events in the history of epidemiology and population health
  • State five objectives of epidemiologic research
  • Compare Epidemiologic Study Designs in the Demonstration of Causality
  • Differentiate between different types of populations

1.1 - Defining Terms

1.1 - Defining Terms

Major Definitions for the Study of Epidemiology

Epidemiology
The study of the distribution of disease and determinants of health-related states or events in specified human populations and the application of this study to the control of human health problems. (JM Last. Dictionary of Epidemiology. 2nd edition)
Clinical Epidemiology
The science of making predictions about individual patients by counting clinical events in similar patients, using strong scientific methods for studies of groups of patients to ensure that the predictions are accurate. (Fletcher, Fletcher, Wagner. Clinical Epidemiology. 1996)

What is the difference between these two views of epidemiology?

In the clinical setting, epidemiologic methods are used to predict a health outcome for an individual based on scientific studies of groups of similar patients. Clinical epidemiology is integral to evidence-based medicine. Epidemiology itself is the study of disease in a population to determine the frequency and distribution of the disease as well as risk factors for the disease. Although epidemiology is defined concerning human populations, epidemiologic principles can be extended to study other problems, such as colony collapse disorder in honeybees or improving herd health for a dairy farm.

General Dichotomies in Epidemiological Studies

When designing epidemiologic studies, choices must be made about the role of the investigator, the purpose of the study, the hypothesis regarding exposure, and the unit of analysis. Here are some examples:

Role of investigator:

  • Observational – The investigator does not manipulate the exposure of participants to risk factors. Most epidemiological studies are observational
  • Experimental - According to the study design, the investigator manipulates the exposure of participants to some factor. Clinical trials and intervention studies are examples of such experiments. If the study participants themselves act to change their exposure to an influence, a natural experiment may occur. For example, a study of persons who have migrated from one environment to another could constitute a natural experiment.

Purpose of the study:

  • Descriptive - describes the distribution of disease by time, place, and person; used to generate hypotheses of disease causation or for health planning
  • Analytic - measures and tests the association between a hypothesized risk factor and a disease

Hypothesized Effect of Exposure:

  • Harmful - exposure increases the risk or presence of disease
  • Beneficial - exposure reduces the risk or presence of disease

Unit of Analysis:

  • Individual - the individual (e.g., person, animal) is the unit of analysis; there is potential to ignore the impact of the community or group effect on individual risk
  • Community - the community (e.g., county, hospital) is the unit of analysis. There is potential for ecological fallacy in such studies. Lacking individual data, assuming that individuals perform similarly to the average of the group may not be true.

Data for a typical epidemiologic study may be summarized in a table comparing the numbers of cases (those with the disease or condition) to non-cases in terms of their exposure to a risk factor or beneficial agent. (2x2 Epidemiologic Table)

2x2 Epidemiologic Table
  Case Non-Case Total
Exposed A B Texposed
No Exposed C D Tnon-exposed
Total Tcases Tnon-cases  

1.2 - History of Epidemiology

1.2 - History of Epidemiology

Selected History of Epidemiology and Population Health

Follow the links in the list below, and explore selected events in the history of epidemiology and population health.

1800s

  • 1849-54John Snow formed and tested the hypothesis on the origin of cholera in London - one of the first studies in analytic epidemiology

1900s

2000s

  • 2000s → Genetic and molecular epidemiology; health disparities; racialism; HIPAA in the USA; West Nile Virus;
  • 2002 → bioterrorism; anthrax and smallpox threat and vaccinations
  • 2003 → SARS, quarantines and public health law; and worldwide epidemiology; BSE in Canada
  • 2004 → SARS recurrence; BSE in the USA; the flu epidemic
  • 2009 → 2010 H1N1 pandemic
  • 2020 → COVID-19 pandemic

 


1.3 - Objectives, Causality, Models

1.3 - Objectives, Causality, Models

Objectives of Epidemiology

The objectives of epidemiology include the ability to:

  • identify the etiology or cause of disease
  • determine the extent of disease
  • study the progression of the disease
  • evaluate preventive and therapeutic measures for a disease or condition
  • develop public health policy

Causality in Epidemiology

One objective of epidemiology is to identify the cause of a disease, with a desire to prevent or modify the severity of the condition. Consider the table below. Would you agree that this table accurately portrays the true causes of death in the U.S. population? Why or why not?

Table: Deaths
Cause Estimated No.* Percentage of Total Deaths
Tobacco 400 000 19
Diet/ Activity Patterns 300 000 14
Alcohol 100 000 5
Microbial Agents 90 000 4
Toxic Agents 60 000 3
Firearms 35 000 2
Sexual Behavior 30 000 1
Motor Vehicles 25 000 1
Illicit Use of Drugs 20 000 <1
Total 1 060 000 50

*Composite approximation drawn from studies that use different approaches to derive estimates, ranging from actual counts (eg, firearms) to population attributable risk calculations (eg, tobacco). Numbers over 100,000 rounded to the nearest 100 000; over 50 000, rounded to the nearest 10,000; below 50,000, rounded to the nearest 5000.

Table: Estimated numbers by 'Cause' of Death(From McGinnis JM, Foege, WH. 1993 JAMA, 270(18): 2207-2212.)

As you may have noticed, the causes of death in Table 1 are all related to modifiable factors. The percentages do not total 100, but if these results are accurate, a large percentage of deaths can be postponed. The opportunity to prevent or ameliorate disease is an exciting component of epidemiologic study.
Epidemiologists follow pre-determined procedures in deciding whether to attribute a particular factor as a cause of a disease or condition. In the late 19th century, a German microbiologist, Robert Koch, devised a scheme for deciding whether or not a particular microbe caused a disease.

Infectious Disease Model

Koch's Postulates

One organism leads to one disease. (one-to-one)

  • A specific organism must always be observed in association with the disease. (regular presence)
  • The organism must be isolated from an infected host and grown in pure culture in the laboratory. (exclusive presence)
  • When organisms from the pure culture are inoculated into a susceptible host organism, it must cause the disease. (sufficient cause)
  • The infectious organism must be re-isolated from the diseased organism and grown in pure culture.

Do you see any problem with applying Koch's postulates to determine the cause of all diseases?

Consider asthma or lung cancer: can one micro-organism be isolated as causing the development of these conditions?

Modern Epidemiology

Modern epidemiology accommodates multiple exposures contributing to increased risk for one disease (many-to-one) and situations where one risk factor contributes to multiple diseases (one-to-many).

Considerations When Assessing Possible Causal Role of a Risk

Obviously, there are many factors to assess when considering whether a potential risk factor causes a disease or condition:

  • How strong is the association? (odds ratio, relative risk)
  • Is there a dose-response relationship?
  • If exposure ceases, what happens? Does the condition change?
  • Can the findings be replicated?
  • Is there biological plausibility?
  • Are there alternative explanations?
  • How specific is the association?
  • Is this consistent with other knowledge?
  • Is there a statistical association? If so, is the association
  • Spurious, due to chance or bias
  • Non-causal OR Causal?
  • Is a temporal relationship observed?
  • Was the study design adequate?

 

Epidemiologic Triad

A traditional model of infectious disease causation, known as the Epidemiologic Triad is depicted in Figure 2. The triad consists of an external agent, a host, and an environment in which the host and agent are brought together, causing the disease to occur in the host. A vector, an organism that transmits infection by conveying the pathogen from one host to another without causing the disease itself, could be part of the infectious process.

A classic example of a vector is the Anopheles mosquito. As the mosquito ingests blood from an infected host, it picks up the parasite plasmodium. The plasmodium is harmless to the mosquito. However, after being stored in the salivary glands and then injected into the next human upon which the mosquito feeds, the plasmodium can cause malaria in the infected human. Thus, the Anopheles mosquito serves as a vector for malaria. Another familiar example of a vector is ticks of the genus Ixodes which can be vectors for Lyme disease.

In the traditional epidemiologic triad model, transmission occurs when the agent leaves its reservoir or host through a portal of exit and is conveyed by a mode of transmission to enter through an appropriate portal of entry to infect a susceptible host. Transmission may be direct (direct contact host-to-host, droplet spread from one host to another) or indirect (the transfer of an infectious agent from a reservoir to a susceptible host by suspended air particles, inanimate objects (vehicles or fomites), or animate intermediaries (vectors).

HOST VECTOR AGENT ENVIRONMENT
Figure 2

Can the epidemiologic triad be applied to a disease that is not infectious? Consider a smoking-related disease (Figure 3). If smoking (or more specifically, a carcinogen in the smoke of the cigarette) causes the disease, those who manufacture, sell and distribute cigarettes are vectors, bringing the disease-causing agent to the susceptible host. Diagramming the epidemiologic triad also indicates potential interventions to reduce disease in the population. In this example, clean indoor air legislation, advertising potential harm from smoking, or establishing workplace smoking cessation programs could change the environment and reduce the exposure of the host to the agent. Conversely, increased advertising from cigarette manufacturers or increased numbers of vendors would increase the exposure of the host to the agent.

HOST VECTOR AGENT ENVIRONMENT Genetic suseptibility, income, resilience Clean indoor air policy, advertising, peer pressure Manufacturers, distributors, vendors Filtered cigarettes, Safe cigarettes
Figure 3

Thus, the traditional model of disease transmission can be useful to identify areas of potential intervention to reduce disease prevalence, whether infectious or non-infectious.

 


1.4 - Epidemiologic Hypotheses, Designs, and Populations

1.4 - Epidemiologic Hypotheses, Designs, and Populations

Hypotheses

An epidemiologic hypothesis is a testable statement of a putative relationship between exposure and disease. The hypothesis should be:

  • Clear
  • Testable or resolvable
  • State the relationship between exposure and disease
  • Limited in scope
  • Not inconsistent with known facts
  • Supported by literature, theory, references

Designs

Hierarchy of Epidemiologic Study Designs in the Demonstration of Causality/Prevention

The design of a study contributes to the strength of its findings. Below are the types of studies, in order of increased strength for testing the relevant hypothesis. We will study some of these designs further later in this course.

Causation Hypothesis

  • Case Study (describing one person with the condition, a case)
  • Case Series (series of cases)
  • Ecological Study (analysis of group statistics..for example, comparing rates of disease between two countries)
  • Cross-Sectional Study (assessing individuals at one time, such as a survey)
  • Case-Control Study (studying those with the condition vs. those without)
  • Cohort Study (following subjects over time to study the initiation and progression of a condition)

Populations

Often, it is not feasible to conduct a study where we collect data from all affected individuals. Thus, we need to select a subset of those individuals for our study. The following defines populations and samples.

Target Population
Population to which inferences from the study are to be made.
A target population may be defined by geography, demography, health status, or some other factor.
Study Population
Population from which study subjects are selected
The sampling frame is the actual list that will be used to select the sample (ex. list of hospital admissions, household addresses, people with a certain disease or outcome)
Careful consideration should go into identifying the sampling frame for a study. If one cannot be created that mostly covers the population, bias can occur.
Sample
Subjects that provide data to the study
Data from the study participants are used to make estimates and draw conclusions about the population

Obviously, the method for selecting the sample can greatly influence the study results. In Lesson 2, we will learn more about methods to select samples.


1.5 - Lesson 1 Summary

1.5 - Lesson 1 Summary

Lesson 1 Summary

This lesson laid the groundwork for the study of epidemiology by introducing key terms, providing an overview of the history of epidemiology, introducing the concept of causality, and presenting examples of hypotheses that can be evaluated using epidemiologic principles.  

In Lesson 2, we’ll build upon these fundamentals to explore how epidemiology is used to inform public health practice. 


Lesson 2 - Public Health Surveillance

Lesson 2 - Public Health Surveillance

Lesson 2 Objectives:

Upon completion of this lesson, you should be able to:

  • State at least 5 uses of disease surveillance information.
  • Explore some public sources of disease surveillance data.
  • Compare and contrast 5 health surveys conducted in the US with regard to the target population, sampling strategy, and purpose
  • Identify advantages and disadvantages of surveys
  • Differentiate between sampling strategies in order to select an appropriate sampling scheme for a survey

2.1 - Public Health Surveillance

2.1 - Public Health Surveillance

Surveillance: Information for Action

The Centers for Disease Control and Prevention have defined surveillance as follows:

"the ongoing systematic collection, analysis, and interpretation of data essential to the planning, implementation, and evaluation of public health practice, closely integrated with the timely dissemination of these data to those responsible for prevention and control ."

Disease surveillance is the basic process by which epidemiologists answer questions about who, where, and when.

Who is getting the disease? Are there differences in the rates of disease by age? sex? race?

Where is the disease happening? Are there geographic areas with particularly high rates? extremely low rates?

Is the occurrence of the disease changing over time? Is the disease becoming more frequent? less frequent?

“Good surveillance does not necessarily ensure the making of right decisions, but it reduces the chance of wrong ones."

~Alexander D. Langmuir NEJM 1963;268;182-191.

Disease surveillance information is useful for:

  • Estimating the magnitude of a problem
  • Determining the geographic distribution of illness
  • Portraying the natural history of a disease
  • Detecting epidemics or defining a problem
  • Generating hypotheses, stimulating research
  • Evaluating control measures
  • Monitoring changes in infectious agents
  • Detecting changes in health practices
  • Facilitating planning

Evaluation of Surveillance Systems

A disease surveillance system should be simple, flexible, and acceptable to the population. For example, to detect hunting-related shooting injuries, the requirements for a hunter to report an episode should not be onerous or many shooting injuries will go unrecorded. The surveillance system should also be representative of the population and provide a timely alarm. Like a smoke detector without a power source, a surveillance system that is not able to recognize a disease outbreak quickly and accurately is not very useful.


2.2 - Sources of Public Health Surveillance Data

2.2 - Sources of Public Health Surveillance Data

Sources of public health surveillance data can include:

  • notifiable diseases,
  • vital records (e.g. National Infant Mortality Surveillance, birth, death records),
  • registry and survey data,
  • administrative databases (such as Medicare or a prescription database), and
  • some laboratory records.

Below are some websites to explore available sources of public health data:

Integrated Surveillance Information Systems/National Electronic Disease Surveillance System
In the U.S., this has been developed to standardize health reporting and link laboratory, hospital, and managed care data.

Pennsylvania’s Department of Health Vital Records

Enterprise Data Dissemination Informatics Exchange (EDDIE)
This is an interactive health statistics dissemination web tool where you can create customized data tables, charts and maps for various health related data.
 
CDC WONDER
This organization furthers the CDC's mission of health promotion and disease prevention by speeding and simplifying access to public health information for state and local health departments, the Public Health Service, and the academic public health community.

SEER
Surveillance, Epidemiology, and End Results Program of the National Cancer Institute

The SRTR Database
The Scientific Registry of Transplant Recipients

U.S. Fire Administration (USFA)
The USFA collects data from a variety of sources to provide information and analyses on the status and scope of the fire problem in the United States.

Health Surveys

In the US, governmental agencies conduct surveys for various purposes at regular intervals. Investigate these surveys by following the links below. [Select to expand]

The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.
Health
Survey
Target
Population
Mode/ Sampling
Strategy/Size
Health Issues;
Example of a Disease/ Outcome and an Exposure
BRFSS Non-institutionalized adult residents of the 50 states and the District of Columbia and selected territories

Mode: telephone (cell and landline) survey

Sampling strategy: Random-digit dial with post-stratification

weighting Size: 400,000+ each year

Health-related risk behaviors and events, chronic health conditions, use of preventive services, emerging health issues (e.g. vaccine shortage, influenza-like illnesses) Example: 2012, Number of adults age 50+ Who Had Blood Stool Test Screening within Last Two Years for Colorectal Cancer 38.8% Yes(CI 31.7-45.9, n=208) No 61.2% (CI 54.1-68.3, n=308) http://apps.nccd.cdc.gov/brfsssmart/MMSARiskChart.as p?yr=2012&MMSA=281&c at=CC&qkey=8521&grp=0

The Youth Risk Behavior Surveillance System (YRBSS) monitors six categories of health-related behaviors that contribute to the leading causes of death and disability among youth and adults, including—

  • Behaviors that contribute to unintentional injuries and violence
  • Sexual behaviors related to unintended pregnancy and sexually transmitted diseases, including HIV infection
  • Alcohol and other drug use
  • Tobacco use
  • Unhealthy dietary behaviors
  • Inadequate physical activity

YRBSS also measures the prevalence of obesity and asthma and other health-related behaviors plus sexual identity and sex of sexual contacts.

YRBSS is a system of surveys. It includes 1) a national school-based survey conducted by CDC and state, territorial, tribal, and 2) local surveys conducted by state, territorial, and local education and health agencies and tribal governments.

Health
Survey
Target
Population
Mode/ Sampling
Strategy/Size
Health Issues;
Example of a Disease/ Outcome and an Exposure
YRBSS Students in public and private high school (9th-12th grade) at the national, state, and local levels in the U.S.

Mode: Anonymous, school-based questionnaire survey, administered in odd-numbered years.

Sampling strategy: multi-stage cluster design.

Size: In 2013, 13,000 youth in 42 states, 21 large urban schools, some tribal governments

Behaviors that contribute to unintentional injuries and violence, Sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases, including HIV infection, Alcohol, and other drug use, Tobacco use, Unhealthy dietary behaviors, Inadequate physical activity, obesity, and asthma Example: among U.S. high school students surveyed in 2013146.8% had ever had sexual intercourse 34.0% had had sexual intercourse during the previous 3 months, and, of this 40.9% did not use a condom the last time they had sex 15.0% had had sex with four or more people during their life
http://www.cdc.gov/healthyyouth/sexualbehaviors/index.htm
The National Health Interview Survey (NHIS) has monitored the health of the nation since 1957. NHIS data on a broad range of health topics are collected through personal household interviews. Survey results have been instrumental in providing data to track health status, health care access, and progress toward achieving national health objectives.
Health
Survey
Target
Population
Mode/ Sampling
Strategy/Size
Health Issues;
Example of a Disease/ Outcome and an Exposure
NHIS Noninstitutionalized civilian adult population residing in the United States

Mode: Cross-sectional household interview survey

Sampling Strategy: stratified multistage household sample, with oversampling of elderly and minorities.

Size: 30,000-40,000 households per year (75,000-100,000 persons)

Monitor trends in illness and disability, track progress toward achieving national health objectives, epidemiologic and policy analysis of such timely issues as characterizing those with various health problems, determining barriers to accessing and using appropriate health care, and evaluating Federal health programs. Example: In 2012 the NHIS showed that 18% percent of U.S. adults were current smokers and 21% were former smokers http://www.cdc.gov/nchs/ data/series/sr_10/sr10_26 0.pdf
The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and has the responsibility for producing vital and health statistics for the Nation.

The NHANES program began in the early 1960s and has been conducted as a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements to meet emerging needs. The survey examines a nationally representative sample of about 5,000 persons each year. These persons are located in counties across the country, 15 of which are visited each year.

The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.

Findings from this survey will be used to determine the prevalence of major diseases and risk factors for diseases. Information will be used to assess nutritional status and its association with health promotion and disease prevention. NHANES findings are also the basis for national standards for such measurements as height, weight, and blood pressure. Data from this survey will be used in epidemiological studies and health sciences research, which help develop sound public health policy, direct and design health programs and services, and expand the health knowledge for the Nation.

Health
Survey
Target
Population
Mode/ Sampling
Strategy/Size
Health Issues;
Example of a Disease/ Outcome and an Exposure
NHANES Non-institutionalized adults and children in U.S.

Mode: Face-to-face interviews and clinical examinations.

Sampling strategy: Multistage area probability sampling.

Size: 5,000 persons each year.

Determine the prevalence of major diseases and risk factors for diseases assess nutritional status and its association with health promotion and disease prevention. Establishes the basis for national standards for such measurements as height, weight, and blood pressure. Example: During 2007– 2012, 46.2% of adults aged 40–79 with lung obstruction currently smoked cigarettes. About 41% with mild and 55% with moderate or worse obstruction were current smokers. http://www.cdc.gov/nchs/ data/databriefs/db181.html
The California Health Interview Survey (CHIS) is the largest state health survey in the nation. It is a web and telephone survey that asks questions on a wide range of health topics. CHIS is conducted on a continuous basis allowing the survey to generate timely one-year estimates. CHIS provides representative data on all 58 counties in California and provides a detailed picture of the health and healthcare needs of California’s large and diverse population.
Health
Survey
Target
Population
Mode/ Sampling
Strategy/Size
Health Issues;
Example of a Disease/ Outcome and an Exposure
CHIS Non-institutionalized civilian population of California (adults, teenagers and children)

Mode: Telephone (cell phones and land-lines) survey

Sampling strategy: random-digit-dial with a supplemental surname list frame for Korean and Vietnamese populations.

Size: 42,000-50,000

From asthma, diabetes, and obesity to immigrant health and health insurance coverage, CHIS covers dozens of essential health topics. Example: Projections from the California Simulation of Insurance Markets (CalSIM) model indicate that up to half of Californians remaining uninsured will be undocumented immigrants who are not eligible under the Affordable Care Act (ACA). Most others lacking insurance will be eligible for Medi-Cal or subsidized insurance through Covered California but remain unenrolled due to difficulties with the enrollment process, inability to afford coverage, concerns about negative immigration-related consequences for themselves or their family members, or other barriers. Almost three/fourths of the remaining uninsured will be Latino, almost one-third will reside in Los Angeles County, and about 70 percent will CHIS is conducted by the UCLA Center for Health Policy Research in collaboration with the California Department of Public Health and the Department of Health Care Services be exempt from paying a tax penalty for lacking coverage. http://healthpolicy.ucla.edu /chis/research/Pages/default.aspx

2.3 - Survey & Sampling Design

2.3 - Survey & Sampling Design

Even though this is not a course on survey design, a large source of public health data comes from surveys. As we saw earlier in the course, it is often not feasible to take measurements on the entire target population, so we must select a sample in which to gather data. This section introduces some advantages and disadvantages of using surveys and approaches to drawing a sample for an epidemiologic survey.

Survey Studies

An epidemiologic survey consists of a simultaneous assessment of the health outcome and exposures as well as potential confounders and effect modifiers. A survey given at a single time point can be part of a cross-sectional study. Some epidemiologists may call it a prevalence study. The survey results provide a 'snapshot' of a population. Surveys are a useful tool for gauging the health of a population or monitoring the effectiveness of a preventative intervention or provision of emergency relief.  

While a survey may provide a relatively quick and inexpensive method for assessing the health of a population, there are both pros and cons, as noted below:

Advantages

  • Inexpensive
  • Relatively quick
  • Can help establish or clarify a hypothesis

Disadvantages

  • Exposure may not have preceded disease or outcome. This limits the assessment of causality. For example, a survey may ask about the current behavior of smoking and a diagnosis of asthma. While the results may show an association between smoking and asthma, we may not be able to accurately determine which came first.
  • Disease and health outcomes with a long duration can be over-represented. Less severe outcomes may be under-represented because they may not have been diagnosed at the time of the survey.
  • Surveys are subject to information bias (e.g. from inaccurate recall or misdiagnosis) and selection bias (e.g. those without a telephone cannot be selected for a random digit dial survey)

Survey Questions and Administration

Survey questions are carefully structured in order to reduce bias. Care should be given to the wording and order of questions. Using a standard questionnaire increases the reliability and validity of the results. A reliable survey has internal consistency and produces results that are replicable. The subject would answer the question in the same way if asked again. Valid questions are those which accurately assess the specific concept that is being measured.

The process of administering a survey should be standardized to reduce the potential for bias. The respondent should be informed of the purpose of the research and freely consent to participate. A survey with a low response rate is likely to have some bias.

NOTE:
STAT 507 is a course in epidemiologic research methods so we will not delve into the strengths and weaknesses of various methods for evaluating the reliability and validity of a survey instrument as might be presented in a psychometric course. You should however recognize the need to consider this type of analysis when selecting a survey instrument.

Sampling Designs

These methods of sampling can be applied to survey studies, as well as other observational and interventional studies.

First, if the population can be enumerated (listed), a simple random sampling approach can be used to draw a representative sample of potential participants. For example, you might generate a list of all children attending a public school and then from this list, randomly select students for the survey. Procedures for simple random sampling can be done in many software packages, including Excel. The use of simple sampling allows us to generalize the results of the survey back to the population from which the sample was drawn.
Sometimes, we want to make sure that there are an adequate number of responses from a group that is relatively small. To do that, we might use stratified random sampling which divides groups into homogeneous groups. Then we can draw simple random samples from each of the groups. Stratified sampling assures that selected subgroups of the population will be represented in the sample. If the strata are homogeneous, statistical precision from stratified sampling is greater than that achieved with simple random sampling. Stratified samples can be proportionate (or disproportionate) to the size of the stratum. If sampling is disproportionate, overall population estimates are constructed by weighting within-group estimates by the sampling fraction. Cluster sampling is a specific type of stratified sampling, and often refers to sampling from geographic areas. A cluster might be a zip code area in the US or streets within a city.
Systematic sampling occurs when we select our sample in a systemic manner. For example, you might select every 10th house on a street to participate in a household survey. Systematic sampling can be easier to implement than simple random sampling and may represent the population as well as a simple random sample. However, if every rth unit corresponds to an existing sequence in the population with the result that each member of the sample was selected from the same part of the recurring pattern, the sample will be biased. For example, if an observation is made every seventh day, beginning on a Monday, the entire sample will only represent Monday experiences.
Finally, there are several types of surveys that may be used but may produce biased population estimates. First, we may choose a convenience sample, such as randomly asking people on a street corner or in a store to participate in a survey. The convenience sample may be useful in gathering preliminary or pilot data for a future survey that would be larger and have more rigorous sampling methods. Finally, you may choose purposive sampling because you are particularly interested in the responses of a specific group.

Each of these approaches is useful, but to what population can the results be generalized? 


2.4 - Lesson 2 Summary

2.4 - Lesson 2 Summary

Lesson 2 Summary

Public health surveillance is important for the health of any nation. In order to decide how to allocate resources, it is vital to know who is being affected, where those people live, and the timeliness of the issue. There are many sources of public health data that can be used to achieve these goals including vital records, mandatory reporting, registries, and health surveys. Surveys are used to gather information that is not standardly or systematically collected. Since we often cannot gather data on the entire population of interest, we need to select a subgroup to sample from, and different methods for sampling were outlined in this lesson.


Lesson 3 - Measurements of Disease Occurrence and Frequency

Lesson 3 - Measurements of Disease Occurrence and Frequency

Lesson 3 Objectives

Upon completion of this lesson, you should be able to:

  • Select and use measures of disease frequency
  • Define and calculate point prevalence, period prevalence, cumulative incidence, and incidence density rate
  • Describe a potential outbreak with regard to person, place, and time.
  • Construct and interpret an epi-curve to describe the course of an outbreak

 


3.1 - Disease Occurrence

3.1 - Disease Occurrence

Outcomes

Typical outcomes for an epidemiologic study, (sometimes referred to as the 'D's of Epidemiology) are as follows:

Outcomes of Epidemiology:

  • Death Disease/Illness - Physical signs, laboratory abnormalities
  • Discomfort - Symptoms (e.g., pain, nausea, dyspnea, itching, tinnitus)
  • Disability - Impaired ability to do usual activities
  • Dissatisfaction - Emotional reaction (e.g., sadness, anger)
  • Destitution - Poverty, unemployment

Calculations

In order to describe and compare measures of disease occurrence, these are the types of calculations most often used:

Count

Definition: the number of individuals who meet the case definition

Example:

9188 cases of invasive colorectal cancer in Pennsylvania in 2005 (PA Cancer Registry data)

Notes: Calculating the magnitude of disease occurrence with a count is simple and useful for certain purposes, such as allocating health resources. For other purposes, it is more helpful to have a denominator under the count that indicates the size of the study population. The remaining measures address this.

Proportion

Definition:  A/(A+B); a fraction in which the numerator (A) includes only individuals who meet the case definition and the denominator (A+B) totals the numbers of individuals who meet the case definition (A) plus those in the study population who do not meet the case definition and are at risk (B).

Example:

30% of persons over 50 years of age have been screened for colon cancer

Notes: A proportion is not dependent upon time. It can be expressed as a fraction or a percentage. A proportion indicates the fraction of the population that is affected by the disease or condition. It is linked to estimating risk.

Ratio

Definition: A/B; a special fraction in which the numerator includes only individuals who meet one criterion (e.g. the case definition, A) and the denominator includes only individuals in the study population who meet another criterion (e.g. do not meet the case definition but are at risk, B). A ratio is not dependent upon time. If the ratio is a ratio of the number of individuals with the outcome to those without the outcome, the ratio is the odds. A ratio as a measure of disease frequency is used infrequently, in special situations. (not to be confused with an odds-ratio or risk-ratio)

Examples:
  • 1 case of colon cancer for every 1 case of breast cancer.
  • 2 female cases of major depression to 1 male case of major depression.

Rate

Definition: a fraction in which the numerator includes only individuals who meet the case definition and the denominator includes individuals in the study population who do or do not meet the case definition but could meet the case definition (at-risk) and the total time at risk they contribute (person-time).  Person-time is defined as the sum of time that each at-risk individual contributes to the study.  If the study period is 2 years, person-time is as follows for certain groups:

  1. For participants who develop the disease
    1. time they spend on study before they developed the disease (< 2 years)
    2. These participants count in the numerator, and denominator
  2. For participants who drop out before 2 year period is over
    1. time they spend on study before they developed the disease (< 2 years
    2. These participants count only in the denominator
  3. For participants who do not develop the disease (in the 2 year window)
    1. 2 years
    2. These participants count only in the denominator

The sum of all these times would be the denominator.

Example:

0.1 case/person-years indicates that, on average, for every 10 person-years (i.e.: 10 people each followed 1 year or 2 people followed for 5 years, etc.) contributed, 1 new case of the health outcome will develop

Notes: Rates differ from proportions in that there is always a time component. It is important to be intentional about the terminology that we use and to correctly differentiate proportions from rates.

Risk

Definition: A measure of the probability of an unaffected individual developing a specified health outcome over a given period of time. Risk is calculated by dividing the number of new cases by the total number of individuals at risk during the specified time period. 

Example:

A 5-year risk of 0.10 indicates that an individual at risk has a 10% chance of developing the given health outcome in a 5-year period

Notes: Risk is typically derived from a cohort study in which each at-risk person is followed over time until he/she is no longer at-risk

3.2 - Disease Frequency: Incidence vs. Prevalence

3.2 - Disease Frequency: Incidence vs. Prevalence

The two main ways by which the frequency of disease is measured are incidence and prevalence. These can be distinguished by differences in the time of disease onset.

Incidence
counts new cases of the disease (or outcome)
Prevalence
counts new and existing cases of the disease (or outcome)

Incidence

Incidence quantifies the development of disease. Incidence can be estimated using data from a disease registry data or a cohort trial. There is an implicit assumption of a period of time, such as new cases within a month (or a year).

A summary incidence rate can estimate the risk (e.g., probability of disease in an individual) if the risk is constant across the summarized groups.

As defined, incidence is a count of new cases. However, it is often expressed as a proportion of those at risk. The denominator includes all persons at risk for the disease or condition, i.e. disease-free or condition-free individuals in the population at the start of the time period. Persons in the denominator, those at-risk, should be able to appear in the numerator. Obviously, the denominator would not include persons who already have the disease or condition. Incidence can also be expressed in terms of person-time at risk.

Rates are usually expressed per 100, 1,000, or 100,000 persons. In a strict application, "rate" should only be used when the denominator is an estimate of the total person-time at risk. (You will find the term "rate" used inconsistently in epidemiologic reports. It is better to seek the source of the numbers than to rely on the nomenclature.)

Two Common Measures of Incidence

Cumulative Incidence
The cumulative incidence consists of the number of persons who newly experience the disease or studied outcome during a specified period of time divided by the total population at risk. This calculation assumes all persons in the denominator contribute an equal amount of time to the measure.
Incidence Density Rate
Incidence density rate (also known as incidence rate; person-time rate) is the number of persons who newly experience the outcome during a specified period of time divided by the sum of the time that each member of the population is at-risk.

Prevalence

Since prevalence counts both new and existing cases, the duration of the disease affects the prevalence. Diseases with a long duration will be more prevalent than those with a shorter duration. Chronic, non-fatal conditions are more prevalent than conditions with high mortality. The prevalence of disease is directly related to the duration of the disease. Prevalence is not an apt descriptor of an acute condition.

Similar to incidence, persons included in the denominator must have the potential for being in the numerator, i.e. at-risk for the disease or condition. Prevalence is often expressed after multiplication by 100 (%), 1000, or 100,000.

The prevalence pool is the subset of the population with the condition of interest. The prevalence pool is not generally useful for hypothesis-driven epidemiologic research because these are not new cases, but can be useful in tracking the natural history of the disease, evaluating effects of treatments, or disease burden.

For most etiologic research, incidence is the more appropriate measure. Studying the incidence of a rare condition, however, poses a challenge. Given a small number of new cases, it can be preferable to estimate prevalence instead of incidence in these situations. For example, birth defect rates reported as the number of cases/live births is a prevalent measure. Similarly, an autopsy rate is a prevalent measure.

Two common measures of prevalence

The difference is whether the estimate is made over a period of time or at one specific time as illustrated below:

Point prevalence
Prevalence of condition of interest at a specific time.
Number of existing cases on a specific date/ Number in the defined population on this date
Point prevalence ranges from 0 to 100. (%)
Point prevalence can be estimated from a cross-sectional survey or disease registry data by calculating the percentage with a particular disease or condition on a particular date.
E.g. what percentage had a particular type of flu on 1/17/2009?
Period prevalence
Prevalence of outcome of interest during a specified period of time.
Less frequently used.
Number of cases that occurred in a specified period of time/ Number in the defined population during this period
Period prevalence generally ranges from 0 to 100 %. (Theoretically, period prevalence can exceed 100% if you allow individuals who had the disease more than once to be counted for each case of the disease within the reporting period.)
E.g. What percentage of the population had an episode of flu between October and May within the most recent flu season?

3.3 - Outbreak Investigation

3.3 - Outbreak Investigation

Investigating a Potential Outbreak

In this course, we have often assumed that investigators have knowledge of a potentially harmful exposure coincidentally with or prior to observing the disease or illness. In other situations, the first indication of harmful exposure is a report of a potential outbreak of disease or illness. Increased numbers of cases of disease or illness may necessitate an outbreak investigation. Questions to be answered in an outbreak investigation include the following:

Are there an unusual number of adverse health outcomes in this community?

If so, how many? Is the number increasing, decreasing, or stable?

What type of exposure may have caused the increase?

What is the anticipated future course and spread of this outbreak?

When an increase in the number of cases of a disease is reported, a speedy response is critical. At the same time, it is also of utmost importance to end up with an answer that will appropriately protect public health and safety. A systematic approach to outbreak investigation helps assure timely and accurate answers:

  • Prepare for fieldwork
  • Establish the existence of an outbreak
  • Verify the diagnosis
  • Define and identify cases
  • Measure the frequency of adverse outcomes and describe the data in terms of time, place, and person
  • Develop hypotheses
  • Evaluate hypotheses
  • Refine hypotheses and carry out additional studies
  • Implement control and prevention measures
  • Communicate findings

 

Orient in Terms of Time, Place, and Person

Characterizing by time: Constructing an Epi-Curve

An epidemic curve, frequently referred to as an 'epi-curve', is used to examine and characterize the occurrence of a possible outbreak. By constructing and examining an accurate epi-curve, an investigator can consider questions such as:

Is there an outbreak? If so, when did the outbreak begin?

Has the outbreak peaked? If so, when was the peak?

What might be the source of the exposure? Is there one source or multiple sources for exposure of cases? Is person-to-person transmission occurring?

Have the attempts to control the outbreak coincided with a decrease in the occurrence of the disease?

An epi-curve is a histogram with the number of cases of the adverse health outcome on the y-axis (ordinate) and dates of onset of the outcome on the x-axis (abscissa). Dates of onset may be grouped by days, weeks, or months, depending on the nature of the potential outbreak. A typical time period used is 1/4 to 1/3 the incubation period for the disease. If the incubation or lag time from exposure to outcome is unknown, it is valuable to experiment with different lengths of time.

A typical epi-curve is a simple chart with one series of data, the onset of cases. In other situations, several layers of data are displayed on the curve. For example, the investigator may want to examine the date of onset in more than one location (e.g. 2 or more cities, states or countries) or in different groups of people (e.g. stratified by age or race).

Another variation of the epi-curve is stacking the bars in order to show different characteristics of the cases. For example, you may decide to separate confirmed cases from suspect cases, using stacked bars to assess whether an outbreak is truly occurring.

Interpreting an Epi-Curve

The following shows the outbreak of COVID-19 cases in Pennsylvania:

 

The first consideration is the overall shape of the curve which is determined by the pattern of the outbreak (common source or person-to-person transmission). The shape also indicates the period of time over which susceptible people are exposed and the minimum, average and maximum incubation periods for the disease.
If the duration of exposure is prolonged, the epidemic is called a "continuous common source epidemic," and the epidemic curve will have a plateau instead of a peak. Person-to-person spread (a "propagated" epidemic) should have a series of progressively taller peaks one incubation period apart.

Cases that stand apart ("outliers") provide valuable information. An early case can represent a background (unrelated) case, a source of the epidemic, or a person who was exposed earlier than others. Similarly, late cases may be unrelated to the outbreak, may have especially long incubation periods, may indicate exposure later than most of the people affected, or maybe secondary cases (the person who becomes ill after being exposed to someone who was part of the initial outbreak). Examine any outliers that are part of the outbreak carefully because they may point directly to the source. For example, a prep chef could be the first case of strep in an epidemic among party-goers eating food prepared by this person.
In a point-source epidemic of a disease with a known incubation period, the epidemic curve can also identify the likely period of exposure.

Characterizing by place

A simple technique for looking at geographic patterns is to plot on a 'spot map' the locations where the affected people live, work, or may have been exposed. A map of cases in a community may show clusters or patterns that reflect water supplies, wind currents, or proximity to a restaurant or grocery store. A classic example is John Snow's detection of the Broad St. water pump as the source of a cholera epidemic. On a spot map of a hospital, nursing home, or another residential facility, clustering may indicate either a primary source or person-to-person spread. The scattering of cases throughout a facility is more consistent with a common source such as a dining hall.

If the size of the overall population varies among the areas being compared, the spot map with the number of cases can be misleading. Indicating the proportion affected or the attack rate for each area would be a better approach.

Characterizing by person

Define the populations at risk for the disease by characterizing an outbreak by personal characteristics such as age, race, sex, medical status, etc, and/or by exposures (e.g., occupation, leisure activities, use of medications, tobacco, drugs, etc.). Age and sex are characteristics often strongly related to exposure and risk; thus these factors are often assessed first. Other factors to be assessed are those possibly related to susceptibility to the disease and to opportunities for exposure to the disease being investigated and in the setting of the outbreak.


3.4 - Lesson 3 Summary

3.4 - Lesson 3 Summary

Lesson 3 Summary

Lesson 3 was a big lesson!  It introduced the main calculations for disease occurrence and frequency including counts, proportions, ratios, rates, and risks. An important component of epidemiologic measures is the concept of time, which is incorporated into rates and risks.  The two main measures of disease frequency are incidence (new cases) and prevalence (new + old cases) which both are important for understanding the landscape of public health issues.  Finally, we learned about outbreak investigations which are needed when a new public health concern arises - again, focusing on the who, when, and where to best understand the issue.


Lesson 4 - Comparing Groups In Terms of Disease Occurrence and Frequency

Lesson 4 - Comparing Groups In Terms of Disease Occurrence and Frequency

Lesson 4 Objectives

Upon completion of this lesson, you should be able to:

  • Organize disease frequency data into a 2 x 2 epidemiological table.
  • Calculate and describe Risk Ratios and Odds Ratios to compare groups.
  • Calculate and describe Risk Differences and Population Attributable Risk to compare groups.
  • Recognize situations in which direct or indirect standardization should be considered.
  • Given the required data, standardize a rate with a direct and indirect method.

4.1 - Example Research Hypotheses & Measurement Calculations

4.1 - Example Research Hypotheses & Measurement Calculations

Research Hypotheses

Suppose our goal is to compare two populations with regard to disease or exposure-disease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location or time or exposure status. We'd like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.

There are many examples of hypotheses that are comparative in nature, such as:

  • High salt intake increases the incidence of heart disease.

    →Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.

  • The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.

    →Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.

  • The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.

    →Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.

  • A high intake of Vitamin C reduces the prevalence of colds.

    →Compare the prevalence of colds for persons with a high intake of Vitamin C compared to those with a low intake.

  • Dizziness is associated more frequently with therapeutic agent A than with agent B.

    →Compare the incidence of dizziness for patients receiving A with those receiving B.

  • The administration of pre-surgical antibiotics decreases the rate of wound infections.

    →Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.

Data Organization: 2x2 table

Consider this cohort study, An Association between Air Pollution and Mortality in Six U.S. cities, which investigated the relationship between air pollution and mortality in six US cities.  The researchers were interested in the exposure of air pollution and the outcome of mortality.  

Exposure refers to the characteristic of interest that the researcher hypothesizes may be associated with or causing a certain outcome.  Often in epidemiological studies, the outcome of interest is a certain disease. Those who develop the disease are often referred to as cases, while those that do not are referred to as non-cases.  

Data from a study that includes a risk factor (exposure) and indicators of the presence or absence of disease is often summarized as shown below:

2 × 2 Table for an Epidemiologic Study
Category Case
(Number)
Non-Cases
(Number)
Total Exposure
(Number)
Exposed A B TotalExposed
Not Exposed C D TotalNotExposed
Total TotalCases TotalNon-Cases Total

For the air pollution cohort study, the following tables can be constructed.

Six cities cumulative incidence of mortality data
Category Dead Alive* Total
High Pollution (Ohio) 291 1060 1351
Low Pollution (Wisconsin) 232 1399 1631
Total 523 2459 2982

*this column was calculated by subtracting the number alive in Table 1 of the manuscript from the total number of participants. See...

 

Six cities incidence rate of mortality data
Category Dead Alive* Person-Years
High Pollution (Ohio) 291   17914
Low Pollution (Wisconsin) 232   21618
Total 523   29532

*For incidence rate, the number of non-diseased (i.e. alive) participants is not necessary. Instead we need person-years for all people who experienced the outcome. 

Measures of Disease Frequency

Disease Prevalence [by Exposure Status]

  • For Exposed: A/(A+B)
    • In our pollution example, this would be 291/1351= 0.215.  Thus, 21.5% of the participants from the high pollution city (ie the exposed group) died.
  • For Not Exposed: C/(C+D)
    • In our pollution example, this would be 232/1631=0.142.  Thus, 14.2% of the participants from the low pollution city (ie the non-exposed group) died.

Exposure Prevalence [by Disease Status]

  • For Cases: A/(A+C)
    • In our pollution example, this would be 291/523= 0.556.  Thus, 55.6% of the participants who died were from the high pollution city.
  • Fon Non-cases: B/(B+D)
    • In our pollution example, this would be 1060/2459= 0.431.  Thus, 43.1% of the participants who did not die were from the high pollution city.

Odds of Disease [by Exposure Status]

  • For Exposed: A/B
    • In our pollution example, this would be 291/1060.  Thus, the odds of dying in the high pollution city are 291:1060 - which can be simplified to 1:3.64.  (This value is hardly ever reported, but is needed to calculate the odds ratio, which will be presented later.)
  • For Non-Exposed: C/D
    • In our pollution example, this would be 232/1399.  Thus, the odds of dying in the low pollution city are 232:1399 - which can be simplified to 1:6.03.  

Odds of Exposure [by Disease Status]

  • For Cases: A/C
    • In our pollution example, this would be 291/232= 1.25:1.  Thus, the odds of being from the high pollution city are 1.25:1 for those who died.
  • For Non-cases: B/D
    • In our pollution example, this would be 1060/1399= 0.76:1.  Thus, the odds of being from the high pollution city are 0.76:1 for those who did not die.

There are two ways to compare measures between groups: ratios and differences. The next few sections will outline both methods and show examples using the air pollution study. Also, note that for these examples, estimates for the cumulative incidences and incidence rates are similar, but that is not always the case. In this study, person-years were similar across groups, resulting in similar estimates.


4.2 - Using Ratios to Compare Two Populations

4.2 - Using Ratios to Compare Two Populations

A ratio may be used to convey the strength of an effect or association between two population groups or the relative 'risk' of the study (e.g. exposed) group compared to a comparison group (e.g. unexposed.). A ratio is not dependent on the prevalence of exposure among the study population.

Ratio
\(\dfrac{\text { Disease Frequency }(\text { Population } A)}{\text { Disease Frequency }(\text { Population } B)}\)

A ratio can be reported with upper and lower bounds. We will learn some formulas for these calculations in a later lesson.  When there is no significant difference between groups, the ratio will equal 1 and/or include 1 in its confidence interval.

Ratio Calculations

Risk Ratios

Cumulative incidence ratio

  • More generally can be thought of as:
    • Ratio of Disease Incidence= [A/(A+B)] / [C/(C+D)]
  • In our pollution example, this would be [291/1351] / [232/1631] = 1.51.  Thus, participants from high-pollution cities are 1.51 times as likely as those from low-pollution cities to die.  This makes sense since we saw that the cumulative incidence of death was about 21% in the high pollution city, and 14% in the low-pollution cities.  

Incidence rate ratio

  • Follows the same general formula, but instead of comparing incidences, we are comparing incidence rates.  So first, we need to calculate the incidence rate in each city.  
    • High pollution city incidence rate of death = 291 deaths/ 17917 person-years, simplifies to 16.24 deaths per 1000 person-years
    • Low pollution city incidence rate of death = 232 deaths/ 21618 person years, simplifies to 10.73 deaths per 1000 person years
  • In our pollution example this would be (16.24/1000 person-years) / (10.72/1000 person-years) = 1.51

Odds Ratio

  • Exposure Odds Ratio [A/C] / [B/D] = [A*D] / [B*C]
  • Disease Odds Ratio [A/B] / [C/D] = [A*D] / [B*C]
  • Both simplify to the same OR
    • In our pollution example, this would be 291*1399 / 1060*232 = 1.66.  Thus, participants from the high-pollution cities have 1.66 times higher odds of dying than those from low-pollution cities.  

4.3 - Using Differences to Compare Two Populations

4.3 - Using Differences to Compare Two Populations

Alternatively, differences can be calculated between the estimates for the two groups. The difference can be reported with a confidence interval that includes upper and lower bounds.  If the confidence interval includes 0, this indicates that there is no significant difference between the groups. If the interval does not include 0, there is an increased risk for one population compared to the other (or conversely, a decreased risk). The difference can convey an excess or decreased risk among the exposed group due to exposure, possibly an excess or decreased risk that would be removed if the exposure ends, a potential reduction in risk for exposed individuals, or the absolute risk of the exposure.

Differences: Disease Frequency(Population A) - Disease Frequency(Population B)

Difference Calculations

Risk Difference

Cumulative incidence difference

  • More generally can be thought of as:
    • Difference of Disease Incidence= [A/(A+B)]  -  [C/(C+D)]
  • In our pollution example, this would be [291/1351]  -  [232/1631] = 21.5% - 14.2% = 7.3%.  Thus, participants from high-pollution cities have a 7.3% higher risk of death than participants from low-pollution cities.

Incidence rate difference

  • In our pollution example, this would be (16.24/1000 person-years) - (10.72/1000 person-years) = 5.51/1000 person-years. Thus, there are 5.51 excess deaths per 1000 person-years among those in the high pollution city.  Alternatively, the number of deaths could be reduced by 5.51 per 1000 person-years, if the pollution level in the high-pollution city was reduced to that of the low-pollution city.

 

Attributable Proportion among the Total Population (APt)

(Also known as population attributable risk (PAR))

The Attributable Proportion among the Total Population depends upon the prevalence of the exposure in the study population. This value is often used to convey implications for policy or regulations.

General Formula

  • \(\mathrm{AP}_{\mathrm{t}}=\dfrac{\text { Risk(study population)-Risk(unexposed group) }}{\text { Risk(study population) }}\)

Incidence rate APt

  • First, we need to calculate the incidence rate in the entire population.  This would be the sum of all the deaths (1430) divided by the sum of all the person-years (111076) = 12.87 deaths per 1000 person-years.
  • In our example, this would be [(12.87/1000 person-years - 10.73/1000 person-years)] / (12.87/1000 person-years) = 0.166.  Thus 16.6% of the deaths in the population are attributable to the high pollution levels, and thus would be eliminated if the pollution levels were reduced.

4.4 - Standardization

4.4 - Standardization

When comparing groups, it is important to make sure we are making a fair comparison. Thus, it is helpful to standardize the rates, in order to remove the effect of a potential confounder (often age), which might differ between populations and could distort the results. Standardization is also helpful when comparing rates of one population over time, such as monitoring disease in a population over many years.

Disease can be measured in one population or compared between populations. Within one population, it is common to summarize disease burden with the number of cases. Another measure is the crude rate (i.e., x cases / y population at-risk), which you will also recognize as the cumulative incidence rate. If the distribution of a modifier of disease frequency (such as age) is different between two populations, however, a comparison of the crude rates in the two populations can mask the rate.

  • A standardized rate is a measure of disease frequency that facilitates comparisons of populations with a different distribution of one or more potential confounding variables. (e.g., x cases / y population at-risk, adjusted to remove the effect of potential confounder [e.g., age]).
  • Age-specific rates (i.e.., x cases in a specific age group/(population at-risk in same age group) are also useful in summarizing the health status of a population.

Types of Standardization

There are two different approaches to standardizing a rate:

Direct standardization

Direct standardization, more commonly used, creates a summary disease rate for a population that would be expected if the study population had a population distribution identical to that of an arbitrarily chosen standard population. A reference population is used as the standard population. The standardized rate is the sum of weighted group-specific rates, with weights derived from the standard population. The weights sum to 1.0. A standardized rate is essentially a weighted average of age-specific rates.

\(I_{W}=\dfrac{\sum W_{i} I_{i}}{\sum W_{i}}\)
where Ii is a group-specific rate and \(\sum W_{i}=1\)

The necessary data for direct standardization is the group-specific disease rates for the study population and the population distribution from the standard population.

  Stop and Think!

Review the SEER Stat Tutorials: Calculating Age-adjusted Rates and Pennsylvania Dept of Health’s tutorial on Age-Adjusted Rates (pa.gov) for producing an adjusted rate by the direct method.

Do you understand how direct adjustment is a weighted average of age-specific rates?

Indirect standardization

Indirect standardization also produces a weighted average, through the production of a summary disease rate for the study population which would be expected if the disease experience of the study population were identical to that of a standard population. The standard population is arbitrarily chosen, but should be as similar as possible to the study population Indirect adjustment is used when accurate group-specific rates for the study population are not available. If these rates are available, direct adjustment is preferred because it uses more information from the study population. Indirect adjustment produces an expected rate. Observed and expected rates are typically compared as a standardized ratio. Indirect adjustment is often used in occupational health to calculate standardized mortality ratios, which is dividing the observed death rate by the expected death rate.

The data required for indirect standardization is the crude rate for the study population; the population distribution for the study population and group-specific rates for the standard population.

Example

Consider the below data for which the researcher could not obtain the gender-specific rates.
From a standard population, it is known that the crude rate is 1.5/1000, the male rate is 2.2/1000, and the female rate is 0.9/1000.
If we also know that:

  • Group 1 male 60%, female 40%
  • Group 2 male 80%, female 20%

What is the expected crude rate for group 1?
 It is (2.2 × 0.6) + (0.9 × 0.4) = 1.68 / 1000. The observed crude rate is 1.68.

  Stop and Think!

Come up with an answer to this question by yourself and then click on the button below to reveal the solution.

What is the expected crude rate for group 2?

Answer: the expected crude rate is (2.2 * 0.8) + (0.9 * 0.2) = 1.94.

Try it! Comparison of Direct and Indirect Standardization Methods

Try the following true/false questions to test your knowledge.

  1. Age-adjusted rates are measures of mortality risk, conveying the magnitude of a health problem.

    False

  2. Age-adjusted rates can be compared regardless of the standard population used.

    False
    Direct standardized rates are only comparable if the same standard population is used. For example, the US standard population of 1940 was considerably younger than the US standard population based on the 2000 census. This will affect the adjusted rates. Always pay attention to the reference population when comparing standardized rates.

  3. Direct age adjustment is an average of the observed age-specific rates, weighting each age-specific rate by the proportion of that same age group in a standard population.

    True

  4. Indirect adjustment is preferred if there are only a few cases across all age groups.

    True

  5. If I want to understand the magnitude of a health problem, the appropriate statistic is the number of events.

    True

  6. To explore the underlying risk in a population, the appropriate statistic is a crude rate with its confidence interval.

    True

  7. To compare populations on the basis of differences in risk, after controlling for age, the appropriate statistic is the crude relative risk.

    False
    The age-adjusted rates and confidence intervals or relative risk (rate ratio) adjusted for age.


4.5 - Lesson 4 Summary

4.5 - Lesson 4 Summary

Lesson 4 Summary

Describing the state of public health is an important component of epidemiology, which was presented in Lesson 3.  The next step of comparing outcomes between groups was introduced in Lesson 4.  We saw some examples of situations when the goal is to compare groups, and learned the two main ways to do so:  absolute (differences) and relative (ratios).  An additional point to consider is that the groups may differ on a characteristic that affects the measure, most often age, so standardization is needed in order to make fair comparisons. 


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility