# 12.5 - Model-Based Methods: Binary Outcomes

12.5 - Model-Based Methods: Binary OutcomesFor a binary outcome, logistic regression analysis is used to model the log odds as a linear combination of parameters and regressors. Let \(p\left(X_1, X_2, \dots , X_K\right)\) denote the probability of success in the presence of the K regressors. The logistic regression model for the log-odds for the i*th* patient is

\(log\left( \dfrac{p(X_{1i}, X_{2i}, ... ,X_{Ki}}{1-p(X_{1i}, X_{2i}, ... ,X_{Ki})} \right)=\beta_0+\beta_1X_{1i}+\beta_2X_{2i}+ ... +\beta_KX_{Ki}\)

Notice that β_{0} represents the reference log odds, i.e., when \(X_{1i} = 0, X_{2i} = 0, \dots , X_{Ki} = 0\). Consider a simple model with one covariate (K = 1) which is binary, e.g., \(X_{1i} = 0\) if the i*th* patient is in the placebo group and 1 if the i*th* patient is in the treatment group. Then the log odds ratio for comparing the treatment to the placebo group is

\(log\left( \dfrac{p(X_{1i}=1)}{1-p(X_{1i}=1)} / \dfrac{p(X_{1i}=0)}{1-p(X_{1i}=0)} \right)=(\beta_0+\beta_1)-\beta_0=\beta_1\)

If the covariate is ordinal or continuous, then

\(log\left( \dfrac{p(X_{1i}=x)}{1-p(X_{1i}=x )} / \dfrac{p(X_{1i}=0)}{1-p(X_{1i}=0) } \right)=(\beta_0+\beta_1x)-\beta_0=\beta_1x\)

so that the odds ratio is \(\text{exp}\left(\beta_1x\right)\). This illustrates that changes in a covariate have a multiplicative effect on the baseline risk.

For example, suppose x represents (age - 18) in a study of adults, and that the estimated coefficient is \(\hat{\beta}_1\) = 0.04 with a *p*-value < 0.05. Then the estimated odds ratio is exp(0.04) = 1.041. This may not seem like a clinical meaningful odds ratio, but remember that it represents the increase in odds between a 19-year-old and an 18-year-old. For a 25-year-old person, the estimated odds ratio is \(\text{exp}\left(0.04 \times 7\right) = 1.323\).

For the logistic regression model, each \(\beta_j, j = 1, 2, \dots , K\), represents the log odds ratio for the \(j^{th}\) covariate. An equivalent expression for the logistic regression model in terms of the probability is

\(p=(X_{1i}, X_{2i}, ... ,X_{Ki})=\dfrac{1}{1+exp\left\{ -(\beta_0+\beta_1X_{1i}++\beta_2X_{2i}+ ... +\beta_KX_{Ki} )\right\}}\)

Logistic regression models are available for an ordinal response. Suppose that an outcome variable, Y, is ordinal and that we designate its ordered categories as 0, 1, ... , C. We model the ordinal logits as

\(log\left( \dfrac{Pr[y \ge c | X_{1i}, X_{2i}, ... ,X_{Ki}]}{1-Pr[y \ge c | X_{1i}, X_{2i}, ... ,X_{Ki}]} \right)=\beta_{0c}+\beta_1X_{1i}+\beta_2X_{2i}+ ... + \beta_KX_{Ki} , c=1,2, ... , C\)

The ordinal logistic regression model has C intercept terms, but only one term for each regressor. This reduced modeling for an ordinal outcome assumes proportional odds (beyond the scope of this course).

## SAS® Example

### Using PROC LOGISTIC in SAS to perform ordinal logistic regression

( 13.4_logistic regression.sas): Boyle et al (Masking of physicians in the Growth Failure in Children with Renal Disease clinical trial. *Pediatric Nephrology* 1993; 7: 204-206) investigated the success of the masking in the randomized, double-blinded, multi-center GFRD clinical trial. The clinical director at each center was asked to identify or guess the assigned treatment for each randomized patient.

```
***********************************************************************
* This is a program that illustrates the use of PROC LOGISTIC in SAS *
* to perform ordinal logistic regression. *
* *
* The sample data set is taken from the following article: *
* Boyle RM, Chinchilli VM, Shasky DA. (1993). Masking of *
* Physicians in the Growth Failure in Children with Renal Diseases *
* Clinical Trial. Pediatric Nephrology 7, 204-206. *
***********************************************************************;
proc format;
value scorefmt -3='Incorrect_certain'
-2='Incorrect_probable'
-1='Incorrect_guess'
0='Unsure'
1='Correct_guess'
2='Correct_probable'
3='Correct_certain';
run;
data gfrd;
input center id months score trtgroup $;
format score scorefmt.;
label months='Months on Study';
cards;
1 1906 9 2 A
2 2901 9 1 B
3 3908 56 -1 B
3 3911 15 1 A
3 3919 9 1 B
4 4901 25 1 A
4 4912 12 1 B
4 4913 13 1 B
5 5901 0 0 A
5 5902 28 0 A
5 5905 2 0 B
6 6904 13 -2 A
6 6920 11 -2 A
6 6922 10 2 A
7 7905 29 -1 B
7 7910 39 1 A
7 7916 40 1 A
7 7919 23 1 A
7 7920 4 -1 B
7 7921 26 -1 B
7 7926 33 1 A
7 7927 30 1 A
7 7929 14 -1 B
7 7930 17 -1 B
7 7931 12 1 A
8 8904 18 -1 B
8 8905 30 -1 A
8 8909 9 1 B
8 8915 2 1 A
8 8917 5 -1 B
8 8918 2 -1 A
9 9901 45 -1 B
9 9903 54 1 A
9 9907 61 1 B
9 9908 28 -1 B
9 9918 31 1 A
9 9919 25 1 A
11 11901 21 -1 B
11 11905 47 -2 B
11 11906 15 -1 A
14 14902 14 -1 B
14 14907 0 1 B
14 14908 38 -1 A
14 14911 0 0 A
14 14912 36 1 B
14 14913 32 -1 A
14 14916 1 1 A
14 14917 7 1 A
16 16904 31 1 A
16 16905 34 1 A
16 16906 13 -1 B
18 18902 11 1 B
18 18904 8 1 B
21 21906 46 2 B
21 21910 14 -2 B
21 21912 6 2 A
21 21915 26 -2 A
21 21916 25 2 B
21 21917 26 -2 A
21 21918 26 -2 B
21 21919 19 2 A
21 21920 15 -2 A
21 21921 2 -2 B
21 21922 15 2 B
22 22901 26 1 A
22 22902 26 1 B
22 22903 26 1 A
22 22909 5 1 B
23 23901 31 -1 B
23 23903 7 1 B
24 24914 9 2 B
24 24917 22 -2 A
24 24918 6 1 A
24 24919 2 -2 B
26 26901 6 -2 B
26 26903 7 -2 A
27 27901 15 -1 A
27 27904 18 1 B
27 27905 3 -2 B
28 28901 7 2 A
28 28902 33 1 B
28 28904 4 1 A
28 28905 16 -2 B
30 30912 15 2 A
31 31901 1 1 A
31 31902 38 1 B
31 31903 31 -1 B
35 35901 6 0 A
35 35904 11 0 B
38 38907 4 0 A
41 41904 9 2 B
42 42901 2 1 A
42 42903 2 -1 B
42 42904 2 -1 B
;
run;
proc print data=gfrd;
title 'GFRD Example';
run;
data gfrd2;
set gfrd;
newscore=0;
if score>0 then newscore=1;
treatment=0;
if trtgroup="A" then treatment=1;
run;
proc logistic data=gfrd2 order=internal descending;
model newscore=treatment months;
title2 'Logistic Regression of the Masking Score';
run;
```

A logistic regression analysis was applied to the binary outcome of incorrect/correct guess. Regressors included treatment group and months in the study. Note the creation of the binary variables, 'newscore' from the score variable, within the data step before the proc logisitic statements. Similarly, a binary variable 'treatment' is created from the variable 'trtgroup'.

Run the program. On the output, you see "probability modeled is newscore=1," which also indicates the order for calculating the odds ratio. The confidence intervals for the odds ratios all include 1. With no statistically significant results, the investigators remained confident that the masking scheme was successful.