11.7 - Comparing Survival Curves
11.7 - Comparing Survival CurvesIf the primary endpoint in a CTE trial is a time-to-event variable, then it will be of interest to compare the survival curves of the randomized treatment arms. Again, we will focus on a nonparametric approach that corresponds to comparing the Kaplan-Meier survival curves rather than a parametric approach.
The Mantel-Haenszel test can be adapted here in terms comparing two groups, say P and E for placebo and experimental treatment. In this situation, the Mantel-Haenszel test is called the logrank test.
The assumptions for the logrank test are that (1) the censoring patterns are the same for the two treatment groups, and (2) the hazard functions for the two treatment groups are proportional.
For each of the K distinct failure times across the two randomized groups at times \(t_1, t_2, \dots , t_K\), a 2 × 2 table is constructed. For failure time \(t_k , k = 1, 2, … , K\), the table is:
Placebo | Exp Treat | |
# events | \(d_{Pk}\) | \(d_{Ek}\) |
# non events | \(n_{Pk} - d_{Pk}\) | \(n_{Ek} - d_{Ek}\) |
The logrank statistic constructs an observed minus expected score, under the assumption that the null hypothesis of equal event rates is true, for each of the K tables and then sums over all tables:
\(O-E=\sum_{k=1}^{K}\left( \frac{n_{Pk}d_{Ek}-n_{Ek}d_{Pk}}{n_{Pk}+n_{Ek}} \right)\)
The variance expression for the O - E score is as follows:
\(V_L=Var(O-E)=\sum_{k=1}^{K}\left( \frac{(d_{Pk}+d_{Ek})(n_{Pk}+n_{Ek}-d_{Pk}-d_{Ek})n_{Pk}n_{Ek}}{(n_{Pk}+n_{Ek}-1)(n_{Pk}+n_{Ek})^2} \right)\)
Then the logrank statistic is:
\(Z_L=(O-E)/\sqrt{V_L}\)
which has an approximate standard normal distribution.
The generalized Wilcoxon test also is a nonparametric test for comparing survival curves and it is an extension of the Wilcoxon rank-sum test in the presence of censoring. It also requires that the censoring patterns for the two treatment groups be the same, but it does not assume proportional hazards.
The first step in constructing the generalized Wilcoxon statistic is to pool the two samples of survival times (including censored values) and order them from lowest to highest. For the \(i^{th}\) observation in the ordered sample with survival (or censored) time\(t_i\), construct a score, \(U_i\), which represents the number of survival (or censored) times less than \(t_i\) minus the number of survival (or censored) times greater than \(t_i\). The \(U_i\) are summed over the experimental treatment group and a variance calculated, i.e.,
\(U=\sum_{i=1}^{n_E}U_i \text {and }V_U = Var(U)=\left( \frac{n_Pn_E}{(n_P+n_E)(n_P+n_E-1)}\right)\sum_{i=1}^{n_P+n_E}U_{i}^{2}\)
such that:
\(Z_U=(O-E)/\sqrt{V_U}\)
has an approximate standard normal distribution.
An example of constructing the \(U_i\) scores ("+" reflects censoring):
\(t_i\) | Group | #\( < t_i\) | #\( > t_i\) | \(U_i\) |
6 | Exp Treat | 0 | 7 | -7 |
10 | Placebo | 1 | 6 | -5 |
10+ | Exp Treat | 2 | 0 | 2 |
12 | Exp Treat | 2 | 4 | -2 |
15+ | Exp Treat | 3 | 0 | 3 |
17 | Placebo | 3 | 2 | 1 |
21 | Placebo | 4 | 1 | 3 |
25+ | Placebo | 5 | 0 | 5 |
Then U = (-7) + 2 + (-2) + 3 = -4.
SAS® Example
Using PROC LIFETEST in SAS to construct Kaplan-Meier survival curves and test statistics for comparing survival curves
A safety and efficacy study was conducted in 83 patients with malignant mesothelioma, an uncommon lung cancer that is strongly associated with asbestos exposure. Patients underwent one of three types of surgery, namely, biopsy, limited resection, and extrapleural pneumonectomy (EPP). Treatment assignment was nonrandomized and based on the extent of disease at the time of diagnosis. Thus, there can be a strong procedure selection bias here in this example.
***********************************************************************
* This is a program that illustrates the use of PROC LIFETEST in SAS *
* to construct Kaplan-Meier survival curves and test statistics for *
* comparing survival curves. *
* *
* The sample data set is based on the results from an SE trial on 83 *
* patients with malignant mesothelioma, an uncommon lung cancer that *
* is strongly associated with asbestos exposure. Patients underwent *
* one of three types of surgery, namely, biopsy, limited resection, *
* and extrapleural pneumonectomy (EPP). Treatment assignment was *
* based on the extent of disease at the time of diagnosis. *
***********************************************************************;
proc format;
value sexfmt 0='female' 1='male';
value psfmt 0='low' 1='high';
value wtchgfmt 1='no' 2='yes';
value surgfmt 1='biopsy' 2='limited resection' 3='EPP';
value eventfmt 0='no' 1='yes';
run;
data mesoth;
input age sex ps hist wtchg surg pftime prog stime dead;
label age='Age'
sex='Sex'
ps='Performance Status'
hist='Histologic Subtype'
wtchg='Weight Change at DX'
surg='Surgery Type'
pftime='PFT Event'
prog='PFtime Censoring'
stime='Survival Time'
dead='Death Event';
format sex sexfmt.
ps psfmt.
wtchg wtchgfmt.
surg surgfmt.
prog eventfmt.
dead eventfmt.;
cards;
60 1 1 136 1 3 394 1 823 1
59 1 0 136 2 3 1338 0 1338 0
51 0 0 130 1 1 184 1 270 1
73 1 1 136 1 3 320 0 320 1
74 1 0 136 2 1 168 0 168 1
39 0 0 136 1 1 36 1 247 1
46 1 1 131 1 3 552 1 694 0
71 1 0 136 1 1 133 1 316 1
69 1 0 136 1 1 175 1 725 0
49 1 0 131 1 1 327 0 327 1
69 1 0 131 1 2 0 0 0 1
72 1 0 131 1 1 676 1 963 0
44 0 0 130 2 2 223 1 265 1
45 1 0 136 2 2 184 1 237 1
57 1 0 132 1 2 145 1 176 1
60 0 1 131 1 1 316 0 316 1
22 1 1 131 1 2 87 1 310 1
46 0 1 131 1 1 135 1 166 1
60 1 0 131 1 3 1 1 28 1
72 1 0 131 1 2 199 1 730 1
65 1 0 131 1 3 39 0 39 1
65 1 1 131 1 2 61 1 116 1
60 1 0 131 1 3 17 0 17 1
64 1 0 131 2 3 799 1 1229 1
61 1 0 131 2 1 61 1 294 1
38 1 0 131 1 1 176 1 322 1
65 1 1 136 1 3 6 0 6 1
73 0 1 131 1 2 292 1 422 1
74 1 0 136 2 2 22 1 22 1
76 1 0 136 1 1 106 1 375 1
57 1 1 131 1 3 248 1 302 1
60 0 0 . 1 1 63 1 365 1
56 1 0 136 1 1 145 1 387 1
62 0 0 136 1 1 104 1 327 1
60 1 0 131 1 1 20 1 247 1
67 0 0 131 1 1 181 1 669 1
64 1 0 131 1 2 89 1 948 1
67 1 1 136 1 1 0 1 400 1
56 0 1 131 1 2 724 1 1074 0
52 1 0 160 2 1 62 1 137 1
56 1 0 131 1 3 93 1 210 1
44 1 0 136 1 3 402 1 648 1
50 0 0 136 2 2 141 1 520 1
63 1 0 . 2 1 156 1 304 1
68 1 1 131 1 2 265 1 349 1
50 1 0 . 2 3 305 1 317 1
41 0 1 131 1 1 181 1 395 1
60 1 0 131 1 1 274 1 503 1
65 1 0 136 2 2 20 1 20 1
47 1 1 131 1 3 411 1 679 0
46 1 1 131 1 2 624 0 624 0
70 1 1 131 1 2 278 1 617 0
58 1 0 136 1 1 20 1 85 1
57 1 1 132 1 3 112 1 139 1
75 1 0 132 2 2 47 1 47 1
66 1 1 136 1 3 294 1 523 1
77 1 0 . 1 1 126 1 157 1
65 0 0 . 2 1 117 1 545 0
46 0 0 131 1 1 63 1 218 1
71 0 1 132 2 1 139 0 139 1
61 1 0 136 1 1 538 1 1170 0
58 1 0 131 1 3 390 1 722 1
49 1 1 136 1 3 1102 0 1102 0
50 1 0 136 1 3 166 1 182 1
73 1 0 136 1 2 58 1 136 1
44 1 0 136 1 1 406 0 406 1
47 0 1 131 1 3 1123 0 1123 0
68 1 0 136 1 1 1009 1 1029 0
66 1 0 132 1 2 37 1 112 1
46 1 1 131 1 1 104 1 764 1
56 1 1 136 1 2 33 1 225 1
68 1 1 136 1 1 20 1 122 1
59 1 0 136 1 2 73 1 165 1
58 0 0 131 1 1 4 0 4 1
66 1 1 132 2 2 205 1 361 1
82 1 0 160 1 1 78 0 78 1
73 1 0 131 1 1 1265 0 1265 1
57 0 0 130 1 2 273 1 318 1
72 1 1 136 2 1 2 1 362 1
69 1 1 . 1 2 1093 0 1093 0
64 0 1 130 1 1 475 0 475 1
65 1 1 130 1 2 292 0 292 1
72 1 1 130 1 2 324 1 499 0
;
run;
proc print data=mesoth;
title 'Mesothelioma Example';
run;
proc lifetest data=mesoth plots=(survival);
strata surg;
time stime*dead(0);
title2 'Comparison of Surgery Types According to Survival Time';
run;
The primary outcome variable was time to death (survival). SAS PROC LIFETEST constructs the Kaplan-Meier survival curve for each surgery group and compares the survival curves via the logrank test (p = 0.48) and the generalized Wilcoxon test (p = 0.63).
Strength of Evidence
Although p-values are useful for hypothesis tests that are specified a priori, they provide poor summaries of clinical effects. In particular, they do not convey the magnitude of a clinical effect. The size of a p-value depends on the magnitude of the estimated treatment effect and its estimated variability (also a function of sample size). Thus, the p-value partially reflects the size of the trial, which has no biological interpretation. In addition, the p-value can mask the magnitude of the treatment effect, which does have biological importance. P-values only quantify the type I error and do not characterize the biologically important effects in the trial. Thus, p-values should not be used to describe the strength of evidence in a trial. Investigators have to look at the magnitude of the treatment effect.
Confidence intervals are more appropriate for describing the strength of evidence in a clinical trial, although they also are affected by the sample size. Most major journals now require this approach as it is many times more informative than simply just the p-value.