3.4 - Comparing Groups

Suppose our goal is to compare 2 populations with regard to disease or exposure-disease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location or time or exposure status. We'd like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.

There are many examples of hypotheses that are comparative in nature:

  • High salt intake increases the incidence of heart disease.
    • --> Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.
  • The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.
    • --> Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.
  • The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.
    • --> Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.
  • A high intake of Vitamin C reduces the prevalence of colds.
    • --> Compare the prevalence of colds for persons with a high-intake of Vitamin C compared to those with low-intake.
  • Dizziness is associated more frequently with therapeutic agent A than with agent B.
    • --> Compare the incidence of dizziness for patients receiving A with those receiving B.
  • The administration of pre-surgical antibiotics decreases the rate of wound infections.
    • --> Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.

Using Ratios to Compare 2 Populations Section

General Formula: \(\dfrac{\text{Disease Frequency (Population A)}}{\text{Disease Frequency (Population B)}}\)

A ratio may be used to convey the strength of an affect or association between two population groups or the relative 'risk' of the study (e.g. exposed) group compared to a comparison group (e.g. unexposed.). A ratio is not dependent on the prevalence of the exposure among the study population.

A ratio should be reported with upper and lower bounds. When there is no significant difference between groups, the ratio will equal 1 or include 1 in its confidence interval. Following the general formula, a value above 1 indicates increased risk for group A while a value below 1 indicates increased risk for group B.


Example 3-5
NOTE! Hazard ratios resulting from a proportional hazards analysis are interpreted similarly to risk ratios.

Other ratios:

  1. Risk ratio (relative risk; e.g., incidence density ratio [IDR]) = Risk(Group A [Exposed])/Risk(Group B [Unexposed])

    Suppose the exposure of interest is cigarette smoking and the outcome is death from cancer. Disease frequency is measured as the risk of death from cancer, an incidence density. If the risk of death among smokers is 0.96/1000 person-years and among nonsmokers, 0.07.1000 person-years, a comparison between 2 groups can be made using the ratio of these two risks, known as the relative risk or the incidence density ratio. In this example, the relative risk is 13.7.


    • 13.7 times greater risk of death from lung cancer for smokers than non-smokers.
    • There is a strong association between smoking and death from lung cancer.
    • Smokers were 13.7 times more likely to die from lung cancer than were nonsmokers.
    Question: What did you learn about the prevalence of smoking (exposure) from this example?
    Nothing. Prevalence doesn't affect this estimate.
  2. Odds ratio (OR)

    What is the difference between a relative risk and an odds ratio? Check it out ...

    Cummings P. The Relative Merits of Risk Ratios and Odds Ratios. Arch Pediatr Adolesc Med. 2009;163(5):438-445. doi:10.1001/archpediatrics.2009.31.

  3. Rate and prevalence ratios = incidence rate or prevalence rate in the first population/ the respective rate in the second population

    1. Standardized mortality ratio (SMR) = observed deaths/expected deaths

    2. Ratio of standardized rates

    3. Rate ratio (relative rate; e.g., the cumulative incidence ratio [CIR] is a ratio of two cumulative incidences)

Using Ratios and Differences to Compare 2 Populations Section

Differences: Disease Frequency(Population A) - Disease Frequency(Population B)

Alternatively, differences can be calculated between the estimates for two groups. The difference can be reported with upper and lower bounds; 0 in the confidence interval indicating no significant difference between the groups. If the interval does not include 0, there is an increased rate for one population compared to the other. (or conversely, a decreased rate). The difference can convey an excess or decreased risk among the exposed group due to exposure, possibly an excess or decreased risk that would be removed if the exposure ends, a potential reduction in risk for exposed individuals or the absolute risk of the exposure.

  1. Attributable risk (rate) (AR) = Risk(Group A [Exposed]) - Risk(Group B [Unexposed])

    Using the numbers in the previous example:

    Risk(Smokers) = 0.96/1000 person-years

    Risk(Nonsmokers) = 0.07/1000 person-years

    \(\text{Attributable Risk (AR)} = \text{Risk}_{\text{(Smokers)}} - \text{Risk}_{\text{(Nonsmokers)}} = 0.89/1000 \text{person-years}\)

    interpreted as follows:

    • The risk of death from lung cancer among the exposed group is 0.89/1000 person-years higher than what it would have been if they had not been smoking.
    • The risk of death from lung cancer among smokers could be reduced by 0.89/1000 persons-years (from 0.96/1000 person-years to 0.07 person-years) if they stopped smoking (became like the unexposed group)
    • Smoking increases risk of death for smokers dramatically.
  2. The population attributable risk (rate) (PAR) is the risk in the total study population that is attributable to the presence of the exposure. The PAR depends upon the prevalence of the exposure in the study population. This value is often used to convey implications for policy or regulations.

    Population attributable risk (rate) (PAR)
    \(\text{PAR}= \text{Risk(study population)} – \text{Risk(unexposed group)}\)

    In the cigarette smoking example, suppose:

    Risk(Total Population) = 0.56/1000 person-years, and

    Risk(Nonsmokers) = 0.07/1000 person-years.

    Then the,

    Population Attributable Risk (PAR) = Risk(Total)- Risk(Nonsmokers) = 0.49/1000 person-years, and

    \(PAR\% = \dfrac{(0.49/1000 person-years)}{0.56/1000 person-years} * 100 = 87.5\%\)

    Leading to these conclusions:

    • 87.5% of lung cancer deaths in the study population are attributable to smoking by the smokers
    • Smoking contributes greatly to the risk of lung cancer death among the population

    This measure is dependent upon the prevalence of the exposure among the total study population.