9.2 - Categorical outcome

From our example, we may be interested in the relationship of BMI with high blood pressure, and want to consider a possible confounder (or effect modifier) of sex.

The outcome is high blood pressure and is a dichotomous value (either present or not).
The predictors/covariates to be considered are BMI and sex.  BMI can be either continuous or put into categories, and sex is a categorical variable.


First, we want to report the percentage of patients who have high blood pressure with a 95% confidence interval.  We find that 43.5% of patients report high blood pressure.  The exact CI for this estimate is (42.2% - 44.8%), again very narrow due to the large sample size.  

high_BP Frequency Percent Cumulative
low/normal 2942 56.48% 2942 56.48%
high 2267 43.52% 5209 100.00%

Bivariable Associations

Next we want to look at the relationship with BMI, and can consider BMI as both continuous and categorical variables

Table of BMIGrp by high_BP
BMIgrp high_BP
low/normal high Total
[18/5-25] normal 1727
[25-30] overwght 953
>= obese 188
Total 2868 2252 5120
Frequency Missing = 89
Percent of Frequency vs BMIgrp divided against high_BP groups graph

We see that as BMI level increases, so does the rate of high BP (30%, 52%, and 73% for increasing levels of BMI).  We can use a chi-squared test here to test the association between the two variables.  It is highly significant, and not surprisingly so, due to the large sample size.

Considerations specifically related to Non-matched Case-Control Studies:

  1. Chi-squared tests can be used for the bivariable association of exposure and outcome.  If any cell counts are less than 5, Fisher’s Exact tests should be used instead.  
  2. If we want to evaluate potential effect modifiers using these types of bivariable association tables, we can use the Mantel-Haenszel statistic, which essentially breaks the exposure * outcome table up by potential effect modifier to evaluate if there are different effects for different strata. 

We can also look at a boxplot or histogram for the continuous version of BMI and see that on average, patients with high BP tend to have higher BMI compared to those without high BP.  

boxplot comparison of high_BP vs BMI
Distribution plots of BMI vs. high BP and low/normal BP
Analysis Variable: bmi
high_BP N Mean Std
95% CL
for Mean
95% CL
for Mean
Minimum 25th
Median 75th
low/normal 2936 24.37 3.61 24.24 24.50 14.12 21.87 24.01 26.49 51.96
high 2263 27.15 4.48 26.97 27.34 15.77 24.05 26.73 29.52 56.68

We can use a two group t-test to compare the means by group, but it is often more streamlined to consider the modeling technique you plan to use, and use that for both bivariable and multivariable associations.  The model with just a single covariate in the model will provide an unadjusted result, and the model with multiple covariates will provide an adjusted result. 

Modeling (Multivariable Associations)

For a dichotomous outcome we may want to estimate odds ratios or risk ratios, and thus will use logistic or log-binomial regression, respectively.  

Logistic regression to estimate Odds Ratio:

Using the table which shows the raw counts in each BMI group who have high BP, we can calculate the OR of high BP for the overweight vs normal BMI group as (1017*1727)/(729*953) = 2.52, and similarly for the obese vs normal groups as (506*1727)/(729*188) = 6.38.

From the logisitic regression model, we get these same estimates, along with 95% CIs:

Label Estimate Standard
Confidence Limits
(OR overwght vs. normal) 2.5281 0.1596 2.2339 2.8610
(OR obese vs. normal) 6.3761 0.6131 5.2809 7.6985

Considerations specifically related to Case-Control Studies:

Remember that for non-matched case-control studies, OR must be calculated since the distribution of exposure is not necessarily representative of the population.  The sampling fractions cancel out in the OR calculation, but not in the RR.  
These logistic regression models can be considered unconditional, which is appropriate for non-matched case control studies, but not MATCHed case control studies. For matched case control studies conditional logistic regression modeling should be used, and the OR is calculated based on concordant and discordant pairs.  

Log-binomial regression to estimate Risk Ratio:

Using the table which shows the raw counts in each BMI group who have high BP, we can calculate the RR of high BP for the overweight vs normal BMI group as (1017/1970)/(729/2456) = 1.74, and similarly for the obese vs normal groups as (506/694)/1017/2456)) = 1.41.

From the log-binomial regression model, we get these same estimates, along with 95% CIs: (note that these are the unadusted RR, which can also be calculated just from the table in the previous section).  Note that if the log-binomial model does not converge, modified poisson regression modeling can be used. 

Label Estimate Standard
Confidence Limits
(RR overwght vs. normal) 1.4536 0.0267 1.3794 1.5317
(RR obese vs. normal) 2.5958 0.0636 2.2914 2.9406

As stated earlier, we often want the RR, so we’ll proceed with those estimates.  But notice how the RR are less extreme than the OR, which is often the case.  And if readers don’t know the distinction between OR and RR, and assume the OR can be interpreted as the RR, they will incorrectly overestimate the difference in risk between groups.

If we want adjusted RR, we can simply add the other covariates to the model. In this case we want to see if sex is a possible confounder or effect modifier.  Adding sex to the model, does not meaningfully change the RR based on BMI (the estimates are essentially the same), thus sex is not a confounder.

  • RR overwght v normal = 1.44
  • RR for obese v normal = 2.59

The model also shows that sex is a significant predictor of blood pressure.  (In the unadjusted setting, we see that the rates of high BP in males is 46% compared to 41% in females - not a huge clinical difference).  The adjusted RR for female v males is 1.06 (95% CI: 1.01 - 1.11).  Suggesting a small increased risk of high BP for females compared to males.

To evaluate sex as a potential effect modifier, we can include an interaction term in the model.  Doing so shows no statistical evidence of an interaction, thus we can assume the relationship between BMI and high blood is similar for both males and females.  If the interaction had been significant, the next step would be to provide stratified analyses, where we estimate RR estimates for BMI with high blood separately for females and males.