Descriptive
Examples of time-to-event data are:
- Time to death
- Time to development of a disease
- Time to first hospitalization
- And many others
One may think that time-to-event data is simply continuous, but since we do not observe the true time for each person in the dataset, this is not the case. The people who do not experience the event still contribute valuable information, and we refer to these patients as “censored”. We use the time they contribute until they are censored, which is the time they stop being followed because the study has ended, they are lost to follow-up, or they withdraw from the study.
For our example, we are interested in the time to development of coronary heart disease (CHD). No patients had CHD upon study entry, and patients were surveyed every 2 years to see if they had developed CHD. Each patient’s “time-to-CHD” will fall into one of these categories:
- They develop CHD within the 30-year study period
Time = years until they develop CHD
Status = event - They do not develop CHD within the 30-year study period, and they stay in the study until the end
Time = 30 years
Status = censored - They do not develop CHD within the 30-year study period, and they leave the study before the 30-year study period is finished (due to death, moving, lost contact, voluntarily withdraw, etc.)
Time = time on study
Status = censored
The best way to describe time-to-event data is by the Kaplan-Meier method. This uses information from all patients, and differentiates between patients who did and did not experience the event. A Kaplan Meier (KM) plot is how we visualize time-to-event data and starts with all patients being event-free at time 0. The KM method uses the number of patients still at risk over time, and patients drop out once they experience the event or are censored. A Kaplan Meier plot and a Cumulative Incidence plot are inverses of each other, so you can choose which best fits your data. Often for “Overall Survival” we use KM plots, which start at 100%, and decrease over time as patients either die or are censored. This can really be considered as plotting the percentage of patients still alive. For our example, it makes more sense to look at a cumulative incidence plot, which starts at 0% and shows how the incidence of CHD increases over time. (A KM plot would plot the percent of people who are CHD-free, and this would decrease over time.)
This plot shows that over time CHD is increasing, and we can get estimates of rates of CHD at different time points using the KM estimate.
Bivariable
When comparing time-to-event data between groups, we can use the KM method again, as well as perform a log-rank test. For our example, suppose we want to compare time to CHD by BP status.
This plot shows that those with high BP at study entry (blue line) have higher rates of CHD than those with low or normal BP (red line). The KM estimates of CHD at 10 years are 12.7% for the high BP group and 4.7% for the low/normal group. At 20 years, these estimates are 26.1% and 12.0%. The log-rank test is essentially a comparison of lines, not specifically comparing estimates at any single point, and is highly significant here (p<0.0001).
Modeling (Multivariable Associations)
We can use Cox Proportional Hazards modeling to estimate the hazard ratio. This model uses the hazard function which is the probability that if a person survives to time t, they will experience the event in the next instant.
Just from eyeballing the previous plot, it appears that the risk of CHD is about twice as high for those with high BP compared to those with low/normal. Actually fitting a Cox model with high BP as a single covariate shows that the estimated hazard ratio is 1.87 (95% CI: 1.69 - 2.08), which fits with what we see in the plot.
The Cox models can also include multiple covariates to test for confounding and interaction terms to evaluate effect modification, similar to those in previous sections. With additional terms in the model, we can estimate adjusted hazard ratios.