4 Comparing Groups In Terms of Disease Occurrence and Frequency

Research Hypotheses

2x2 Table

Odds

Risk Ratios

Odds Ratios

Standardization

Difference

Objectives

Upon completion of this lesson, you should be able to:

Organize disease frequency data into a 2 x 2 epidemiological table.
Calculate and describe Risk Ratios and Odds Ratios to compare groups.
Calculate and describe Risk Differences and Population Attributable Risk to compare groups.
Recognize situations in which direct or indirect standardization should be considered.
Given the required data, standardize a rate with a direct and indirect method.

4.1 Example Research Hypotheses & Measurement Calculations

Research Hypotheses

Suppose our goal is to compare two populations with regard to disease or exposure-disease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location, time, or exposure status. We’d like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.

There are many examples of hypotheses that are comparative in nature, such as:

High salt intake increases the incidence of heart disease.

Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.
The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.

Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.
The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.

Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.
A high intake of Vitamin C reduces the prevalence of colds.

Compare the prevalence of colds for persons with a high intake of Vitamin C compared to those with a low intake.
Dizziness is associated more frequently with therapeutic agent A than with agent B.

Compare the incidence of dizziness for patients receiving A with those receiving B.
The administration of pre-surgical antibiotics decreases the rate of wound infections.

Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.

Data Organization: 2 x 2 table

Consider this cohort study, An Association between Air Pollution and Mortality in Six U.S. cities, which investigated the relationship between air pollution and mortality in six US cities. The researchers were interested in the exposure of air pollution and the outcome of mortality.

Exposure refers to the characteristic of interest that the researcher hypothesizes may be associated with or causing a certain outcome. Often in epidemiological studies, the outcome of interest is a certain disease. Those who develop the disease are often referred to as cases, while those who do not are referred to as non-cases.

Data from a study that includes a risk factor (exposure) and indicators of the presence or absence of disease is often summarized as shown below:

2 × 2 Table for an Epidemiologic Study
Category	Case (Number)	Non-Cases (Number)	Total Exposure (Number)
Exposed	A	B	Total_Exposed
Not Exposed	C	D	Total_NotExposed
Total	Total_Cases	Total_Non-Cases	Total

For the air pollution cohort study, the following tables can be constructed.

Six Cities' Cumulative Incidence of Mortality Data
Category	Dead	Alive*	Total
High Pollution (Ohio)	291	1060	1351
Low Pollution (Wisconsin)	232	1399	1631
Total	523	2459	2982

*this column was calculated by subtracting the number alive in Table 1 of the manuscript from the total number of participants. See…

Characteristic	Portage, Wisc.	Topeka, Kans.	Watertown, Mass..	Harriman, Tenn.	St. Louis	Steubenville, Ohio
No. of participants	1,631	1,239	1,336	1,258	1,296	1,351
Person-years of follow-up	21,618	16,111	19,882	17,836	17,715	17,914
No. of deaths	232	156	248	222	281	291
Deaths/ 1000 person-years	10.73	9.68	12.47	12.45	15.86	16.24

Source: Dockery DW, Pope CA 3rd, Xu X, Spengler JD, Ware JH, Fay ME, Ferris BG Jr, Speizer FE. An association between air pollution and mortality in six U.S. cities. N Engl J Med. 1993 Dec 9;329(24):1753-9. doi: 10.1056/NEJM199312093292401. PMID: 8179653.

Six cities incidence rate of mortality data
Category	Dead	Alive*	Person-Years
High Pollution (Ohio)	291	-	17914
Low Pollution (Wisconsin)	232	-	21618
Total	523	-	29532

*For incidence rate, the number of non-diseased (i.e. alive) participants is not necessary. Instead, we need person-years for all people who experienced the outcome.

Measures of Disease Frequency

Disease Prevalence [by Exposure Status]

For Exposed: A/(A+B)
- In our pollution example, this would be 291/1351= 0.215. Thus, 21.5% of the participants from the high pollution city (ie the exposed group) died.
For Not Exposed: C/(C+D)
- In our pollution example, this would be 232/1631=0.142. Thus, 14.2% of the participants from the low pollution city (ie the non-exposed group) died.

Exposure Prevalence [by Disease Status]

For Cases: A/(A+C)
- In our pollution example, this would be 291/523= 0.556. Thus, 55.6% of the participants who died were from the high pollution city.
Fon Non-cases: B/(B+D)
- In our pollution example, this would be 1060/2459= 0.431. Thus, 43.1% of the participants who did not die were from the high pollution city.

Odds of Disease [by Exposure Status]

For Exposed: A/B
- In our pollution example, this would be 291/1060. Thus, the odds of dying in the high pollution city are 291:1060 - which can be simplified to 1:3.64. (This value is hardly ever reported, but is needed to calculate the odds ratio, which will be presented later.)
For Non-Exposed: C/D
- In our pollution example, this would be 232/1399. Thus, the odds of dying in the low pollution city are 232:1399 - which can be simplified to 1:6.03.

Odds of Exposure [by Disease Status]

For Cases: A/C
- In our pollution example, this would be 291/232= 1.25:1. Thus, the odds of being from the high pollution city are 1.25:1 for those who died.
For Non-cases: B/D
- In our pollution example, this would be 1060/1399= 0.76:1. Thus, the odds of being from the high pollution city are 0.76:1 for those who did not die.

There are two ways to compare measures between groups: ratios and differences. The next few sections will outline both methods and show examples using the air pollution study. Also, note that for these examples, estimates for the cumulative incidences and incidence rates are similar, but that is not always the case. In this study, person-years were similar across groups, resulting in similar estimates.

4.2 Using Ratios to Compare Two Populations

A ratio may be used to convey the strength of an effect or association between two population groups or the relative ‘risk’ of the study (e.g. exposed) group compared to a comparison group (e.g. unexposed.). A ratio is not dependent on the prevalence of exposure among the study population.

Ratio

\[\dfrac{\text { Disease Frequency }(\text { Population } A)}{\text { Disease Frequency }(\text { Population } B)}\]

A ratio can be reported with upper and lower bounds. We will learn some formulas for these calculations in a later lesson. When there is no significant difference between groups, the ratio will equal 1 and/or include 1 in its confidence interval.

Ratio Calculations

Risk Ratios

Cumulative incidence ratio:

More generally can be thought of as:
- Ratio of Disease Incidence= [A/(A+B)] / [C/(C+D)]
In our pollution example, this would be [291/1351] / [232/1631] = 1.51. Thus, participants from high-pollution cities are 1.51 times as likely as those from low-pollution cities to die. This makes sense since we saw that the cumulative incidence of death was about 21% in the high pollution city, and 14% in the low-pollution cities.

Incidence rate ratio:

Follows the same general formula, but instead of comparing incidences, we are comparing incidence rates. So first, we need to calculate the incidence rate in each city.
- High pollution city incidence rate of death = 291 deaths/ 17917 person-years, simplifies to 16.24 deaths per 1000 person-years
- Low pollution city incidence rate of death = 232 deaths/ 21618 person years, simplifies to 10.73 deaths per 1000 person years
In our pollution example this would be (16.24/1000 person-years) / (10.72/1000 person-years) = 1.51

Odds Ratio

Exposure Odds Ratio [A/C] / [B/D] = [A*D] / [B*C]
Disease Odds Ratio [A/B] / [C/D] = [A*D] / [B*C]
Both simplify to the same OR
- In our pollution example, this would be 291*1399 / 1060*232 = 1.66. Thus, participants from the high-pollution cities have 1.66 times higher odds of dying than those from low-pollution cities.

Note! Odds ratios and risk ratios describe different relationships, so be sure you understand the difference between the two.

Check out this article for a deeper understanding: Cummings P. The Relative Merits of Risk Ratios and Odds Ratios. Arch Pediatr Adolesc Med. 2009;163(5):438-445. doi:10.1001/archpediatrics.2009.31.

4.3 Using Differences to Compare Two Populations

Alternatively, differences can be calculated between the estimates for the two groups. The difference can be reported with a confidence interval that includes upper and lower bounds. If the confidence interval includes 0, this indicates that there is no significant difference between the groups. If the interval does not include 0, there is an increased risk for one population compared to the other (or conversely, a decreased risk). The difference can convey an excess or decreased risk among the exposed group due to exposure, possibly an excess or decreased risk that would be removed if the exposure ends, a potential reduction in risk for exposed individuals, or the absolute risk of the exposure.

Differences

Disease Frequency(Population A) - Disease Frequency(Population B)

Difference Calculations

Risk Difference

Cumulative incidence difference:

More generally can be thought of as:
- Difference of Disease Incidence= [A/(A+B)] - [C/(C+D)]
In our pollution example, this would be [291/1351] - [232/1631] = 21.5% - 14.2% = 7.3%. Thus, participants from high-pollution cities have a 7.3% higher risk of death than participants from low-pollution cities.

Incidence rate difference:

In our pollution example, this would be (16.24/1000 person-years) - (10.72/1000 person-years) = 5.51/1000 person-years. Thus, there are 5.51 excess deaths per 1000 person-years among those in the high pollution city. Alternatively, the number of deaths could be reduced by 5.51 per 1000 person-years, if the pollution level in the high-pollution city was reduced to that of the low-pollution city.

Attributable Proportion among the Total Population (AP_t)

(Also known as population attributable risk (PAR))

The Attributable Proportion among the Total Population depends upon the prevalence of the exposure in the study population. This value is often used to convey implications for policy or regulations.

General Formula

\(\mathrm{AP}_{\mathrm{t}}=\dfrac{\text{ Risk(study population)}-\text{Risk(unexposed group) }}{\text{ Risk(study population) }}\)

Incidence rate AP_t

First, we need to calculate the incidence rate in the entire population. This would be the sum of all the deaths (1430) divided by the sum of all the person-years (111076) = 12.87 deaths per 1000 person-years.
In our example, this would be [(12.87/1000 person-years - 10.73/1000 person-years)] / (12.87/1000 person-years) = 0.166. Thus 16.6% of the deaths in the population are attributable to the high pollution levels, and thus would be eliminated if the pollution levels were reduced.

4.4 Standardization

When comparing groups, it is important to make sure we are making a fair comparison. Thus, it is helpful to standardize the rates, in order to remove the effect of a potential confounder (often age), which might differ between populations and could distort the results. Standardization is also helpful when comparing rates of one population over time, such as monitoring disease in a population over many years.

Disease can be measured in one population or compared between populations. Within one population, it is common to summarize disease burden with the number of cases. Another measure is the crude rate (i.e., x cases / y population at-risk), which you will also recognize as the cumulative incidence rate. If the distribution of a modifier of disease frequency (such as age) is different between two populations, however, a comparison of the crude rates in the two populations can mask the rate.

A standardized rate is a measure of disease frequency that facilitates comparisons of populations with a different distribution of one or more potential confounding variables. (e.g., x cases / y population at-risk, adjusted to remove the effect of potential confounder [e.g., age]).
Age-specific rates (i.e., x cases in a specific age group/(population at-risk in same age group)) are also useful in summarizing the health status of a population.

Types of Standardization

There are two different approaches to standardizing a rate:

Direct standardization

Direct standardization, more commonly used, creates a summary disease rate for a population that would be expected if the study population had a population distribution identical to that of an arbitrarily chosen standard population. A reference population is used as the standard population. The standardized rate is the sum of weighted group-specific rates, with weights derived from the standard population. The weights sum to 1.0. A standardized rate is essentially a weighted average of age-specific rates.

\[I_{W}=\dfrac{\sum W_{i} I_{i}}{\sum W_{i}}\]

where \(I_i\) is a group-specific rate and \(\sum W_{i}=1\)

The necessary data for direct standardization is the group-specific disease rates for the study population and the population distribution from the standard population.

Try It!

Review the SEER Stat Tutorials: Calculating Age-adjusted Rates and Pennsylvania Dept of Health’s tutorial on Age-Adjusted Rates (pa.gov) for producing an adjusted rate by the direct method.

Do you understand how direct adjustment is a weighted average of age-specific rates?

Indirect standardization

Indirect standardization also produces a weighted average, through the production of a summary disease rate for the study population which would be expected if the disease experience of the study population were identical to that of a standard population. The standard population is arbitrarily chosen, but should be as similar as possible to the study population Indirect adjustment is used when accurate group-specific rates for the study population are not available. If these rates are available, direct adjustment is preferred because it uses more information from the study population. Indirect adjustment produces an expected rate. Observed and expected rates are typically compared as a standardized ratio. Indirect adjustment is often used in occupational health to calculate standardized mortality ratios, which is dividing the observed death rate by the expected death rate.

The data required for indirect standardization is the crude rate for the study population; the population distribution for the study population and group-specific rates for the standard population.

Example 4.1 (Gender Specific Rates) Consider the below data for which the researcher could not obtain the gender-specific rates.

From a standard population, it is known that the crude rate is 1.5/1000, the male rate is 2.2/1000, and the female rate is 0.9/1000.

If we also know that:

Group 1 male 60%, female 40%
Group 2 male 80%, female 20%

What is the expected crude rate for group 1?

It is \((2.2 × 0.6) + (0.9 × 0.4) = 1.68 / 1000\). The observed crude rate is 1.68.

Try It!

Come up with an answer to this question by yourself and then click on the button below to reveal the solution.

What is the expected crude rate for group 2?

The expected crude rate is (2.2 * 0.8) + (0.9 * 0.2) = 1.94.

Try It!

Comparison of Direct and Indirect Standardization Methods

Try the following true/false questions to test your knowledge.

Age-adjusted rates are measures of mortality risk, conveying the magnitude of a health problem.

False
Age-adjusted rates can be compared regardless of the standard population used.

Direct standardized rates are only comparable if the same standard population is used. For example, the US standard population of 1940 was considerably younger than the US standard population based on the 2000 census. This will affect the adjusted rates. Always pay attention to the reference population when comparing standardized rates.
Direct age adjustment is an average of the observed age-specific rates, weighting each age-specific rate by the proportion of that same age group in a standard population.

True
Indirect adjustment is preferred if there are only a few cases across all age groups.

True
If I want to understand the magnitude of a health problem, the appropriate statistic is the number of events.

True
To explore the underlying risk in a population, the appropriate statistic is a crude rate with its confidence interval.

True
To compare populations on the basis of differences in risk, after controlling for age, the appropriate statistic is the crude relative risk.

False
The age-adjusted rates and confidence intervals or relative risk (rate ratio) adjusted for age.

4.5 Lesson Summary

Describing the state of public health is an important component of epidemiology, which was presented in Lesson 3. The next step of comparing outcomes between groups was introduced in Lesson 4. We saw some examples of situations when the goal is to compare groups and learned the two main ways to do so: absolute (differences) and relative (ratios). An additional point to consider is that the groups may differ on a characteristic that affects the measure, most often age, so standardization is needed in order to make fair comparisons.

4.1 Example Research Hypotheses & Measurement Calculations

Research Hypotheses

Data Organization: 2 x 2 table

Measures of Disease Frequency

4.2 Using Ratios to Compare Two Populations

Ratio Calculations

4.3 Using Differences to Compare Two Populations

Difference Calculations

Attributable Proportion among the Total Population (APt)

General Formula

Incidence rate APt

4.4 Standardization

Types of Standardization

Direct standardization

Try It!

Indirect standardization

Try It!

Try It!

Comparison of Direct and Indirect Standardization Methods

4.5 Lesson Summary

Attributable Proportion among the Total Population (AP_t)

Incidence rate AP_t