Manual Competing Risks and Multistate Models with R (Use R!)

This stepwise procedure yields parsimonious but targeted multi-state models with well interpretable coefficients and optimized predictive ability, even for smaller data sets. Multi-state models are a flexible tool for analyzing complex disease processes, where individuals are allowed to move between a finite number of states. The states may be defined through the stages of the disease, incidences of clinical symptoms or occurring complications, or death. The states and the possible transitions between these states fully characterize the disease process.

In recent years, multi-state models have been studied widely [ 1 — 9 ] and clinical applications have become more frequent [ 10 , 11 ]. Multi-state models are an extension of the classical survival model, usually solved by the Cox proportional hazard model [ 12 , 13 ]. There are two major advantages of multi-state models: Firstly, they can provide a detailed insight into disease processes, as covariate effects on each transition can be estimated. Etiological aspects of different phases of the disease can be studied.

Analysis of competing states is possible, such as different causes of death or competing therapy outcomes competing risk models or a sequence of states such as disease recurrences classical multi-state models. Secondly, prognosis from multi-state models can be more accurate than from the standard model with one single, potentially combined, endpoint. During the course of the disease, predictions can be adjusted when additional information like the occurrence of intermediate events becomes present. In return for these advantages, the drawback of statistical instability has to be accepted.

Dividing a process into multiple sub-processes bears the risk of small event counts within some of the studied transitions which in turn can result in unstable estimates with large confidence intervals and p-values. During the past years, multi-state models have been improved by allowing additional assumptions that can stabilize the results [ 2 , 7 , 8 ]. Various extensions and assumptions provide high flexibility, but at the same time complicate the specification of a multi-state model.

Applications using these extensions are rare in the literature [ 7 , 11 , 14 , 15 ]. One cause may be uncertainty among investigators how to specify a multi-state model using these techniques. The purpose of our work was therefore to provide a formal instruction to specify an optimized multi-state model using the established options. The described adaptation procedure was modeled after a technique by P. Lachin [ 16 ]. Within another context, they described a formal model specification process for reducing stratified proportional hazards models.

We expanded this Thall-Lachin-approach and transferred it into the multi-state context. In the further work, we call the optimized models as restricted or reduced multi-state models. For a clinical example on ovarian cancer patients, we adapt a restricted multi-state model stepwise, following a defined procedure that starts with a full multi-state model with a minimum of restrictions. Modeling results and prognoses for two simulated patients will be outlined for every specification step. Our example shows that even for smaller data sets with rare event counts, multi-state models can improve estimation results and increase predictive accuracy.

This study was approved by the Medical Board Hamburg, reference number All clinical investigations have been conducted according to the Declarations of Helsinki. Written informed consent was obtained from all patients to access their tissue and review their medical records when they first attended the clinic according to our investigational review board and ethics committee guidelines.

Prior to analysis, patient information was anonymized and de-identified. Our data set covers the patients with epithelial ovarian cancer with primary surgery at the University Medical Center Hamburg-Eppendorf between and The anonymized and de-identified data used to produce the presented results can be downloaded from http: Clinical details are described elsewhere [ 17 ].

To keep the example as simple as possible, only the three most important prognostic covariates are considered, i. Descriptive statistics and frequencies of observed transitions are summarized in Table 1. We define three states for our data example, differentiating the states after surgery: All patients start in the "healthy" state, some of them move to "progression" and some patients transit directly to "death". It is also possible to move from "progression" to "death". In Fig 1 , the multi-state model is displayed in a typical manner: A model with the structure given in Fig 1 is generally called an illness-death model or a disability model without recovery.

It is a frequently approached example of a multi-state model [ 3 , 18 — 20 ]. The hazard can be understood as instantaneous potential of a specific transition at time t. With the "clock-forward" approach, the time refers to the time since study start. The clock continues running, independent of the occurrence of intermediate events. In contrast, the "clock-reset" approach assumes a reset to zero, every time the subject moves to another state. The current time t then refers to the sojourn time in the present state [ 8 ]. A property that is often assessed in multi-state modeling is the Markov assumption.

It implies that the future depends on the history of a process only through the present [ 4 , 21 , 22 ]. In clock-reset models the time scale itself depends on the time when the present state was reached, therefore the Markov assumption is violated by definition. Thus, only clock-forward models can meet the Markov assumption. The semi-Markov assumption [ 4 , 7 , 8 ] relaxes the Markov restriction: The process may depend on the present state and the time since entry of that state.

Further dependence of the time since initiation makes the homogeneous semi-Markov model a non-homogeneous semi-Markov model. Semi-Markov clock-reset models are often used when duration dependencies should be modeled. The non-homogeneous semi-Markov clock-reset model for the ovarian cancer example is specified through the transition hazards in 1 ,.

In 1 , ten coefficients have to be estimated, three for each of the three transitions, plus one for t 01 0. Sometimes the assumption of transition-specific covariate effects results in overfit, which may be a problem particularly in processes with rare event counts. The result is that estimates lose precision.

A Systematic Model Specification Procedure for an Illness-Death Model without Recovery

Yet, multi-state models offer options for adapting more parsimonious models. With further assumptions on potentially proportional baseline hazards or identical covariate effects across transitions, covariate effects can be estimated more efficiently. The procedure starts with the full Markov model, where transitions and covariate effects are allowed to vary between transitions. Further steps result from answering the following questions:.

Finally, a tailor-made restricted multi-state model is obtained in a step-down procedure. Choosing the appropriate baseline time scale may depend on the context as well as on practical considerations. Generally, in clinical or epidemiological studies, there are various possible time scales like the time since onset of a disease, time since surgery, time since a special treatment or time since birth. All these chronological time scales differ only in their origins. Major criteria for the choice of the time scale is the fact that its scale must be relevant in the investigated context to ensure the interpretability of the covariate effects [ 23 ].

The usability of different time scales depends on the application. In the present context, the time since primary surgery is considered. In multi-state modelling, there are even more aspects to consider with respect to the time scale. While the clock-forward approach uses the time from onset of the study for all transitions, the observations start at time zero after each transition when the clock-reset approach is used.

For illness-death models, this choice concerns only the transition from illness to death, since for transitions from the initial state the time since onset is the only available time scale. In this work, we proceed with the clock-reset approach, as it bears the advantage that the sojourn time spent in the healthy state before experiencing a progress may be explicitly included as a covariate for the transition from progression to death.

The Markov assumption is violated by construction as a clock-reset model is adapted. Otherwise, it could have been checked by testing the effect of the time in the healthy state t 01 0 on the transition from progression to death. Significance of t 01 0 would reveal that the Markov assumption should be rejected in favor of the semi-Markov assumption. Further, a non-parametric approach basing on Kendall's tau was developed for testing markovianity [ 24 ].

To make assumptions on the proportionality of baseline hazards, clinical considerations, literature statements, graphical and statistical tests should be considered. From a clinical point of view and as also literature states, it is reasonable to imply proportional baseline curves for transitions ending up in the death state, while transitions landing in progression follow a non-proportional baseline hazard [ 25 ].

The assumption can be tested using standard approaches, like the Schoenfeld test of proportional hazards or graphically plotting ln cumulative baseline hazard against ln analysis time for each transition. If the curves appear parallel, the assumption of proportional hazards holds. To test whether covariate effects are identical across transitions, interactions between covariates and transitions are tested and eliminated stepwise. The likelihood ratio test and the AIC are used as specification criteria.

If an interaction is insignificant, a version of the model omitting this interaction is recalculated. If the AIC of the latter model is improved and the likelihood-ratio test does not attest a significant difference between the models, the latter model is preferred. This step is repeated until the model contains only significant interactions. In a last step, insignificant factors will be waived stepwise. For further comparison and evaluation, we examine prognostic features of the different models. We consider two simulated patients with opposite characteristics as in Table 2.

Patient A is supposed to have a good prognosis, while patient B has a bad prognosis at onset. Prognosis in semi-Markov models is not self-evident, as the time in the healthy state is not known at onset, but can be updated when the event occurs. For the two simulated patients prognoses are computed at the essential steps through the adaptation process. To evaluate which model performs best, the predictive ability will be compared using prediction error curves, a time dependent estimate of the Brier score.

The time-dependent Brier score at time t is defined as the squared difference between the real survival status at time t 1 if subject is alive at t , 0 otherwise and the prediction from time 0 of surviving t , which is model based. As the survival status at time t may be right censored for single observations, inverse probability of censoring weights IPCW are used [ 26 , 27 ].

The bootstrap cross-validation component is based on samples of training sets with each subjects and accordingly subjects in the validation sets. Technical details of the method are described in [ 26 — 30 ]. For technical reasons, we subtracted half a month from the progression time in cases where progression and death coincided.

Models and graphs are calculated with Stata and the R package mstate and pec [ 30 — 32 ]. Theoretical aspects of the prognosis can be found in the accompanying literature [ 25 , 32 ] and in the works by Putter et al. Throughout the adaptation process, we use the clock-reset approach and start with the non-homogeneous semi-Markov model introduced in 1. Further results of the non-homogeneous semi-Markov model with freely varying coefficients and with transition-specific baseline hazards in the latter: The state probabilities from the full model for patients A and B are shown in Fig 2.

They can be interpreted as the probability of being in a particular state at a certain time after starting in a given state. Probabilities in Fig 2 are stacked, this means that the probability of being in each state is represented by the height of the corresponding band. The states are ordered by increasing severity, so that the probability of being alive can be simply read off by adding the neighbored gray belts.

The two upper graphs show the state probabilities over time since study onset, starting in the healthy state. Of course, the probability of staying in the healthy state decreases from 1 at time 0. At the same time, the probabilities of progression and dying increase. It is visible that the probability of being in the state after progression decreases again around the 30th month, because these individuals further transit to the death state.

Correspondingly, the lower graphs display the probabilities of staying in the progression state or dying after progression. Probabilities for patients A left and B right for being in distinct states after study onset upper and immediately after progression lower. Next, we test the assumption of proportional baseline hazards of the transitions into "death".

From the log-log plot in Fig 3 it is unclear whether the assumption holds for the death hazards. For proportionality, the curves have to be parallel. However, there seems to be no clear violation of the PH assumption either.

Survival Analysis with Multiple Causes of Death: Extending the Competing Risks Model

Parallelism of the curves in a log-log plot indicates proportionality of the baseline hazard curves. This plot does not contradict the assumption of proportional baseline hazards for transitions into "death". To exemplify the options of the multi-state model we assume proportional baseline hazards for mortality and an arbitrary baseline hazard for progression in this example. It demonstrates how the risk of dying changes with occurrence of a progression.

This semi-Markov model with proportional baseline hazards for transitions into "death" is called the "PH model" in the further text. Estimated hazard ratios from the PH model are reported in Table 4. To test whether covariate effects can be assumed to be identical across transitions, interactions between covariates and transitions are tested and eliminated stepwise. As in the previous model, we assume proportional baseline hazards of the transitions into "death".

The PH model is the most general model considered.

Description

In the first step, equalizing the effect of residual tumour across all three transitions model B yields a reasonable model reduction regarding AIC and the likelihood ratio test. Assuming furthermore equal FIGO effects across all transitions does not improve the model. However, equal effects of FIGO staging across transitions into "death" model C increases the model fit. The effect of age should be modelled independently for the distinct transitions, according to our selection criteria.

Regarding insignificant main effects in the next step, age is waived as predictor for the transition from "healthy" to "progression" model D due to insignificance. Table 5 shows the model fit criteria of the corresponding models.

Competing Risks and Multistate Models with R — Institut für Medizinische Biometrie und Statistik

The likelihood ratio test does not show significant differences between the reduced sub-models. We prefer the parsimonious model and finally specify the restricted multi-state model E. In this model, age is supposed to affect only the transitions into death and is not informative for progression. FIGO staging only correlates with progression and has no effect on transitions into "death".

Residual tumour is supposed to have equal effects across transitions. The timing of progression impacts the prognosis after progression. The hazard functions for the final restricted multi-state model E are specified as in formula 3 ,.

Killer Whale 3 Cross Stitch Pattern;
Associated Data?
Nonsense Songs.
;
A Systematic Model Specification Procedure for an Illness-Death Model without Recovery.

While ten coefficients have to be estimated in 1 , the model in 3 gets along with only six. The estimated hazard ratios from model 3 , in the following called the "reduced PH model", are presented in Table 6. The predicted transition probabilities for patients A and B are displayed in Fig 5.

The prediction error curves of the three models and from a null model without covariates, the Kaplan-Meier model, are displayed in Fig 6. It is visible that the three adapted multi-state models have a better predictive ability than the null model. However, the full model, the PH model and the reduced PH model show a very similar performance. From the final model we summarize that the time to progression has a prognostic impact on survival after progression. Every progression-free month decreases the mortality by about 2. If there is no censored observations in your data, put NULL.

This functions computes the Nelson-Aalen estimator as described in Anderson et al. Returns a list named after the possible transitions, e.

Each part contains a data. A data frame, with columns from and to , that gives the possible transitions. The variance estimator 4. Klein recommends the use of the variance estimator of eq. Statistical models based on counting processes.

Springer Series in Statistics. Small sample moments of some estimators of the variance of the Kaplan-Meier and Nelson-Aalen estimators. Scandinavian Journal of Statistics , For more information on customizing the embed code, read Embedding Snippets. R Description This function computes the multivariate Nelson-Aalen estimator of the cumulative transition hazards in multistate models, that is, for each possible transition, it computes an estimate of the cumulative hazard. Related to mvna in mvna Zigzag Expanded Navigation Plots.