6.3 - Analysis of Ecologic Studies

Analytic models in ecologic studies are of different forms:

Completely Ecologic
all variables (outcome, exposure and covariates) are ecological.
Partially Ecologic
some, but not all, variables are ecological.
analyses may simultaneously include individual and ecological variables on the same construct (e.g., income). This could be called multilevel modeling, hierarchical regression, or a mixed effects modeling.

Sample Ecological Data and Analysis Section

The following data illustrate a problem with interpretation of ecological studies. The data include the numbers in an exposed and non-exposed group and the disease rate per 100,000 person-years within each of three different groups.

Ecological Data and Analysis

data capture chart

Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

First, calculate the following measures: percentage exposed and the rate of disease for each group.


  • Group 1: 7/20 = 35%
  • Group 2: 10/20 = 50%
  • Group 3: 13/20 = 65%

Rate of disease in

  • Group 1: 165/20,000 = 8.25/1000
  • Group 2: 150/20,000 = 7.5/1000
  • Group 3: 135/20,000 = 6.75/1000
Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

What do these data tell you about the relationship between exposure and the disease rate?

It seems that disease rate decreases with increased percentage exposure.

You could put this into a regression equation and you would come out with the rate ratio of 0.50.

data capture chart

The natural conclusion would seem to be that exposure protects individuals from the disease by decreasing the rate of disease by half. So...would you want to be exposed to this factor in order to cut your disease risk in half? Or would you like to ask further questions?

What about the fact that we have no data measured at the individual level. For example, do we know the exposure level and the disease outcome for each person in the study? NO! In fact, all the cases could have actually occurred among the exposed individuals. This would be a problem if our hypothesis was that a biologic process was responsible for the increased risk.

Consider these tables:

ecological study design

Stratum 1 and Stratum 2 are similar to the groups, of which there were 3, in the previous example. We don't know the numbers for each cell within any stratum, nor do we know A, B, C or D for the combined data. Only the marginal counts are known - the number exposed and unexposed, and the numbers of cases and non-cases within each stratum. So, if our hypothesis for the risk pathway is biological, then we run the risk of an ecological fallacy. An ecological fallacy is possible when we use group-level data as evidence for risk pathways that operate at the individual level because we are ascribing group observations to the individual! (Note: Group-level data are appropriate if our hypothesis is that the disease pathway is from a group-level exposure. Group-level exposures are recognized as important in disease causation models with both individual and group processes).

Individual-level Data and Analysis Section

To demonstrate the ecological fallacy, let's look at the individual level data from the same example. We will fill in the number of cases within each cell for each group. For instance, in group 1, there were 20 case in 7,000 person-years of being at-risk.

Ecologic Effect Modification, but not Confounding

data capture table

Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

What is the disease rate for the exposed population in group 1? group 2? group 3?

Similarly for the unexposed?

Among the exposed in

  • Group 1: 20/7000 = 2.86/1000 person-years
  • Group 2: 20/10,000 = 2/1000 person-years
  • Group 3: 20/13,000 = 1.53/1000 person-years

Among the unexposed

  • Group 1: 13/13000 or 1/1000 person-years
  • Group 2: 10/10,000 or 1/1000 person-years
  • Group 3: 7/7000 or 1/1000 person-years
Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

Now what do you conclude?

We see that in each group, the exposed people had higher rates of disease. So, we would conclude that exposure increases the risk of this outcome, which is the opposite of what we concluded previously! We also observe that the rate of disease among the non-exposed was the same for all groups. Also, the rate of disease among the exposed was higher than the unexposed, but the rate seems to vary among the exposed groups.

Recall, that when we used the group-level data we saw that this exposure appeared to be protective. Using only ecological data, the rate ratio was 0.5; HOWEVER, given the individual-level data, the rate ratio is 2.0. This is an example of an ecological fallacy (or ecological bias)....using group-level data to support an individual -isk pathway.

All Data


Can an ecological study produce results without ecological bias? Yes, under certain conditions....

If the rate difference is the same - If the rate difference is the same between the exposed and non-exposed for each of the groups, there will be no ecological fallacy.

Example 6-2: Given different data, where the rate difference is the same in all 3 groups, the measures match (overall crude, adjusted and ecologic rate ratios = 1.8).

Ecologic Confounding, but not Effect Modification


Statistical Models and Estimation of Effect Section

  1. Using a Linear Model: Ordinary least squares (OLS)

    Model: \(\hat{Y}= B_{0}+B_{1}X\)

    X = 1 or 0 with 1 indicating exposure

    • Predicted Rate in Unexposed Group = \(B_0\)
    • Predicted Rate in Exposed Group = \(B_0 + B_1\)
    • Estimated Rate Ratio = \((B_0 + B_1) / B_0 = 1 + B_1 / B_0\)
    • Estimated Rate Difference = \((B_0 + B_1) - B_0 = B_1\)
  2. Using a Log-linear (exponential) Model : ln \(\hat{Y}= B_{0}+B_{1}X\) or \(\hat{Y}= exp\left [ B_{0}+B_{1}X \right ]\)

    • Estimate Rate Ratio = \(\text{exp}[B_1]\)