# 12.5 - Model-Based Methods: Binary Outcomes

12.5 - Model-Based Methods: Binary Outcomes

For a binary outcome, logistic regression analysis is used to model the log odds as a linear combination of parameters and regressors. Let $$p\left(X_1, X_2, \dots , X_K\right)$$ denote the probability of success in the presence of the K regressors. The logistic regression model for the log-odds for the ith patient is

$$log\left( \dfrac{p(X_{1i}, X_{2i}, ... ,X_{Ki}}{1-p(X_{1i}, X_{2i}, ... ,X_{Ki})} \right)=\beta_0+\beta_1X_{1i}+\beta_2X_{2i}+ ... +\beta_KX_{Ki}$$

Notice that β0 represents the reference log odds, i.e., when $$X_{1i} = 0, X_{2i} = 0, \dots , X_{Ki} = 0$$. Consider a simple model with one covariate (K = 1) which is binary, e.g., $$X_{1i} = 0$$ if the ith patient is in the placebo group and 1 if the ith patient is in the treatment group. Then the log odds ratio for comparing the treatment to the placebo group is

$$log\left( \dfrac{p(X_{1i}=1)}{1-p(X_{1i}=1)} / \dfrac{p(X_{1i}=0)}{1-p(X_{1i}=0)} \right)=(\beta_0+\beta_1)-\beta_0=\beta_1$$

If the covariate is ordinal or continuous, then

$$log\left( \dfrac{p(X_{1i}=x)}{1-p(X_{1i}=x )} / \dfrac{p(X_{1i}=0)}{1-p(X_{1i}=0) } \right)=(\beta_0+\beta_1x)-\beta_0=\beta_1x$$

so that the odds ratio is $$\text{exp}\left(\beta_1x\right)$$. This illustrates that changes in a covariate have a multiplicative effect on the baseline risk.

For example, suppose x represents (age - 18) in a study of adults, and that the estimated coefficient is $$\hat{\beta}_1$$ = 0.04 with a p-value < 0.05. Then the estimated odds ratio is exp(0.04) = 1.041. This may not seem like a clinical meaningful odds ratio, but remember that it represents the increase in odds between a 19-year-old and an 18-year-old. For a 25-year-old person, the estimated odds ratio is $$\text{exp}\left(0.04 \times 7\right) = 1.323$$.

For the logistic regression model, each $$\beta_j, j = 1, 2, \dots , K$$, represents the log odds ratio for the $$j^{th}$$ covariate. An equivalent expression for the logistic regression model in terms of the probability is

$$p=(X_{1i}, X_{2i}, ... ,X_{Ki})=\dfrac{1}{1+exp\left\{ -(\beta_0+\beta_1X_{1i}++\beta_2X_{2i}+ ... +\beta_KX_{Ki} )\right\}}$$

Logistic regression models are available for an ordinal response. Suppose that an outcome variable, Y, is ordinal and that we designate its ordered categories as 0, 1, ... , C. We model the ordinal logits as

$$log\left( \dfrac{Pr[y \ge c | X_{1i}, X_{2i}, ... ,X_{Ki}]}{1-Pr[y \ge c | X_{1i}, X_{2i}, ... ,X_{Ki}]} \right)=\beta_{0c}+\beta_1X_{1i}+\beta_2X_{2i}+ ... + \beta_KX_{Ki} , c=1,2, ... , C$$

The ordinal logistic regression model has C intercept terms, but only one term for each regressor. This reduced modeling for an ordinal outcome assumes proportional odds (beyond the scope of this course).

## SAS® Example

### Using PROC LOGISTIC in SAS to perform ordinal logistic regression

( 13.4_logistic regression.sas): Boyle et al (Masking of physicians in the Growth Failure in Children with Renal Disease clinical trial. Pediatric Nephrology 1993; 7: 204-206) investigated the success of the masking in the randomized, double-blinded, multi-center GFRD clinical trial. The clinical director at each center was asked to identify or guess the assigned treatment for each randomized patient.

***********************************************************************
* This is a program that illustrates the use of PROC LOGISTIC in SAS  *
* to perform ordinal logistic regression.                             *
*                                                                     *
* The sample data set is taken from the following article:            *
*    Physicians in the Growth Failure in Children with Renal Diseases *
*    Clinical Trial.  Pediatric Nephrology 7, 204-206.                *
***********************************************************************;

proc format;
value scorefmt -3='Incorrect_certain'
-2='Incorrect_probable'
-1='Incorrect_guess'
0='Unsure'
1='Correct_guess'
2='Correct_probable'
3='Correct_certain';
run;

data gfrd;
input center id months score trtgroup \$;
format score scorefmt.;
label months='Months on Study';
cards;
1       1906        9         2       A
2       2901        9         1       B
3       3908       56        -1       B
3       3911       15         1       A
3       3919        9         1       B
4       4901       25         1       A
4       4912       12         1       B
4       4913       13         1       B
5       5901        0         0       A
5       5902       28         0       A
5       5905        2         0       B
6       6904       13        -2       A
6       6920       11        -2       A
6       6922       10         2       A
7       7905       29        -1       B
7       7910       39         1       A
7       7916       40         1       A
7       7919       23         1       A
7       7920        4        -1       B
7       7921       26        -1       B
7       7926       33         1       A
7       7927       30         1       A
7       7929       14        -1       B
7       7930       17        -1       B
7       7931       12         1       A
8       8904       18        -1       B
8       8905       30        -1       A
8       8909        9         1       B
8       8915        2         1       A
8       8917        5        -1       B
8       8918        2        -1       A
9       9901       45        -1       B
9       9903       54         1       A
9       9907       61         1       B
9       9908       28        -1       B
9       9918       31         1       A
9       9919       25         1       A
11      11901       21        -1       B
11      11905       47        -2       B
11      11906       15        -1       A
14      14902       14        -1       B
14      14907        0         1       B
14      14908       38        -1       A
14      14911        0         0       A
14      14912       36         1       B
14      14913       32        -1       A
14      14916        1         1       A
14      14917        7         1       A
16      16904       31         1       A
16      16905       34         1       A
16      16906       13        -1       B
18      18902       11         1       B
18      18904        8         1       B
21      21906       46         2       B
21      21910       14        -2       B
21      21912        6         2       A
21      21915       26        -2       A
21      21916       25         2       B
21      21917       26        -2       A
21      21918       26        -2       B
21      21919       19         2       A
21      21920       15        -2       A
21      21921        2        -2       B
21      21922       15         2       B
22      22901       26         1       A
22      22902       26         1       B
22      22903       26         1       A
22      22909        5         1       B
23      23901       31        -1       B
23      23903        7         1       B
24      24914        9         2       B
24      24917       22        -2       A
24      24918        6         1       A
24      24919        2        -2       B
26      26901        6        -2       B
26      26903        7        -2       A
27      27901       15        -1       A
27      27904       18         1       B
27      27905        3        -2       B
28      28901        7         2       A
28      28902       33         1       B
28      28904        4         1       A
28      28905       16        -2       B
30      30912       15         2       A
31      31901        1         1       A
31      31902       38         1       B
31      31903       31        -1       B
35      35901        6         0       A
35      35904       11         0       B
38      38907        4         0       A
41      41904        9         2       B
42      42901        2         1       A
42      42903        2        -1       B
42      42904        2        -1       B
;
run;

proc print data=gfrd;
title 'GFRD Example';
run;

data gfrd2;
set gfrd;
newscore=0;
if score>0 then newscore=1;
treatment=0;
if trtgroup="A" then treatment=1;
run;

proc logistic data=gfrd2 order=internal descending;
model newscore=treatment months;
title2 'Logistic Regression of the Masking Score';
run;


A logistic regression analysis was applied to the binary outcome of incorrect/correct guess. Regressors included treatment group and months in the study. Note the creation of the binary variables, 'newscore' from the score variable, within the data step before the proc logisitic statements. Similarly, a binary variable 'treatment' is created from the variable 'trtgroup'.

Run the program. On the output, you see "probability modeled is newscore=1," which also indicates the order for calculating the odds ratio. The confidence intervals for the odds ratios all include 1. With no statistically significant results, the investigators remained confident that the masking scheme was successful.

  Link ↥ Has Tooltip/Popover Toggleable Visibility