9.2 - Categorical outcome
9.2 - Categorical outcomeFrom our example, we may be interested in the relationship of BMI with high blood pressure, and want to consider a possible confounder (or effect modifier) of sex.
The outcome is high blood pressure and is a dichotomous value (either present or not).
The predictors/covariates to be considered are BMI and sex. BMI can be either continuous or put into categories, and sex is a categorical variable.
Descriptive
First, we want to report the percentage of patients who have high blood pressure with a 95% confidence interval. We find that 43.5% of patients report high blood pressure. The exact CI for this estimate is (42.2% - 44.8%), again very narrow due to the large sample size.
high_BP | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
low/normal | 2942 | 56.48% | 2942 | 56.48% |
high | 2267 | 43.52% | 5209 | 100.00% |
Bivariable Associations
Next we want to look at the relationship with BMI, and can consider BMI as both continuous and categorical variables
Table of BMIGrp by high_BP | |||
---|---|---|---|
BMIgrp | high_BP | ||
low/normal | high | Total | |
[18/5-25] normal | 1727 (70.32%) |
729 (29.68%) |
2456 |
[25-30] overwght | 953 (48.38%) |
1017 (51.62%) |
1970 |
>= obese | 188 (27.09%) |
506 (72.91%) |
694 |
Total | 2868 | 2252 | 5120 |
Frequency Missing = 89 |
We see that as BMI level increases, so does the rate of high BP (30%, 52%, and 73% for increasing levels of BMI). We can use a chi-squared test here to test the association between the two variables. It is highly significant, and not surprisingly so, due to the large sample size.
Considerations specifically related to Non-matched Case-Control Studies:
- Chi-squared tests can be used for the bivariable association of exposure and outcome. If any cell counts are less than 5, Fisher’s Exact tests should be used instead.
- If we want to evaluate potential effect modifiers using these types of bivariable association tables, we can use the Mantel-Haenszel statistic, which essentially breaks the exposure * outcome table up by potential effect modifier to evaluate if there are different effects for different strata.
We can also look at a boxplot or histogram for the continuous version of BMI and see that on average, patients with high BP tend to have higher BMI compared to those without high BP.
Analysis Variable: bmi | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
high_BP | N | Mean | Std Dev |
Lower 95% CL for Mean |
Upper 95% CL for Mean |
Minimum | 25th Pctl |
Median | 75th Pctl |
Maximum |
low/normal | 2936 | 24.37 | 3.61 | 24.24 | 24.50 | 14.12 | 21.87 | 24.01 | 26.49 | 51.96 |
high | 2263 | 27.15 | 4.48 | 26.97 | 27.34 | 15.77 | 24.05 | 26.73 | 29.52 | 56.68 |
We can use a two group t-test to compare the means by group, but it is often more streamlined to consider the modeling technique you plan to use, and use that for both bivariable and multivariable associations. The model with just a single covariate in the model will provide an unadjusted result, and the model with multiple covariates will provide an adjusted result.
Modeling (Multivariable Associations)
For a dichotomous outcome we may want to estimate odds ratios or risk ratios, and thus will use logistic or log-binomial regression, respectively.
Logistic regression to estimate Odds Ratio:
Using the table which shows the raw counts in each BMI group who have high BP, we can calculate the OR of high BP for the overweight vs normal BMI group as (1017*1727)/(729*953) = 2.52, and similarly for the obese vs normal groups as (506*1727)/(729*188) = 6.38.
From the logisitic regression model, we get these same estimates, along with 95% CIs:
Label | Estimate | Standard Error |
Confidence Limits | |
---|---|---|---|---|
(OR overwght vs. normal) | 2.5281 | 0.1596 | 2.2339 | 2.8610 |
(OR obese vs. normal) | 6.3761 | 0.6131 | 5.2809 | 7.6985 |
Considerations specifically related to Case-Control Studies:
Remember that for non-matched case-control studies, OR must be calculated since the distribution of exposure is not necessarily representative of the population. The sampling fractions cancel out in the OR calculation, but not in the RR.
These logistic regression models can be considered unconditional, which is appropriate for non-matched case control studies, but not MATCHed case control studies. For matched case control studies conditional logistic regression modeling should be used, and the OR is calculated based on concordant and discordant pairs.
Log-binomial regression to estimate Risk Ratio:
Using the table which shows the raw counts in each BMI group who have high BP, we can calculate the RR of high BP for the overweight vs normal BMI group as (1017/1970)/(729/2456) = 1.74, and similarly for the obese vs normal groups as (506/694)/1017/2456)) = 1.41.
From the log-binomial regression model, we get these same estimates, along with 95% CIs: (note that these are the unadusted RR, which can also be calculated just from the table in the previous section). Note that if the log-binomial model does not converge, modified poisson regression modeling can be used.
Label | Estimate | Standard Error |
Confidence Limits | |
---|---|---|---|---|
(RR overwght vs. normal) | 1.4536 | 0.0267 | 1.3794 | 1.5317 |
(RR obese vs. normal) | 2.5958 | 0.0636 | 2.2914 | 2.9406 |
As stated earlier, we often want the RR, so we’ll proceed with those estimates. But notice how the RR are less extreme than the OR, which is often the case. And if readers don’t know the distinction between OR and RR, and assume the OR can be interpreted as the RR, they will incorrectly overestimate the difference in risk between groups.
If we want adjusted RR, we can simply add the other covariates to the model. In this case we want to see if sex is a possible confounder or effect modifier. Adding sex to the model, does not meaningfully change the RR based on BMI (the estimates are essentially the same), thus sex is not a confounder.
- RR overwght v normal = 1.44
- RR for obese v normal = 2.59
The model also shows that sex is a significant predictor of blood pressure. (In the unadjusted setting, we see that the rates of high BP in males is 46% compared to 41% in females - not a huge clinical difference). The adjusted RR for female v males is 1.06 (95% CI: 1.01 - 1.11). Suggesting a small increased risk of high BP for females compared to males.
To evaluate sex as a potential effect modifier, we can include an interaction term in the model. Doing so shows no statistical evidence of an interaction, thus we can assume the relationship between BMI and high blood is similar for both males and females. If the interaction had been significant, the next step would be to provide stratified analyses, where we estimate RR estimates for BMI with high blood separately for females and males.