Survcompare_application

Package background

The package checks whether there are considerable non-linear and interaction terms in the data, and quantifies their contributions to the models’ performance. Using repeated nested cross-validation (that is, a system of random data splits into the training and testing sets to evaluate prediction accuracy), the package:

Validates Cox Proportionate Hazards model, or Cox Lasso depending on the user’s input. This step employs
- survival::coxph() (Therneau, 2023);
- glmnet::glmnet(..., family="cox") (Simon, Friedman, Hastie & Tibshirani , 2011).
Validates Survival Random Forest ensembled with the baseline Cox model. To fit the ensemble, CoxPH’s out-of-sample predictions are added to the list of orignial predictors, which are then used to fit SRF, using
- randomForestSRC::rfsrc()(Ishwaran & Kogalur, 2023).
Validates stacked ensemble of the CoxPH and Survival Random Forest, which predicts as a linear combination of the CoxPH and SRF, (1-lambda) x CoxPH + lambda x SRF. Lambda parameter is tuned within the package and the value can indicate if taking a share of SRF predictions could improve the baseline CoxPH’s performance. Lambda = 0 means only CoxPH is used, lambda = 1 means the model is the same as SRF.
GitHit only: deep learning model DeepHit ‘survivalmodels::deephit()’, as well as its sequential and stacked ensembles with CoxPH or Cox-Lasso.
Performs statistical testing of whether the Survival Random Forest [ensemble] outperforms the Cox model.
Does NOT handle missing data at the moment.

Predictive performance is evaluated by averaging different measures across all train-test splits. The following measures are computed:

Discrimination measures: Harrell’s concordance index, time-dependent AUCROC.
Calibration measures: calibration slope, calibration alpha.
Overall fit: Brier score, Scaled Brier score.

Performance metrics’ description and definitions can be found, for example, in Steyerberg & Vergouwe (2014).

References:

Therneau T (2023). A Package for Survival Analysis in R. R package version 3.5-7, https://CRAN.R-project.org/package=survival.

Simon N, Friedman J, Tibshirani R, Hastie T (2011). “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software, 39(5), 1–13. doi:10.18637/jss.v039.i05.

Ishwaran H, Kogalur U (2023). Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 3.2.2, https://cran.r-project.org/package=randomForestSRC

Steyerberg EW, Vergouwe Y. (2014). Towards better clinical prediction models: seven steps for development and an ABCD for validation. European heart journal, 35(29), 1925-1931 https://doi.org/10.1093/eurheartj/ehu207

What can be inferred from the `survcompare` results?

There are two possible outcomes: 1) “Survival Random Forest ensemble has NOT outperformed CoxPH”, or 2) “Survival Random Forest ensemble has outperformed CoxPH by … in C-index”.

If there is no outperformance, the test can justify the employment of CoxPH model and indicate a negligible advantage of using a more flexible model such as Survival Random Forest.
In the case of the ensemble’s outperformance of the Cox model, a researcher can:
- Employ a more complex or flexible model.
- Look for the interaction and non-linear terms that could be added to the CoxPH and re-run the test.
- Consider using the CoxPH model despite its underperformance if the difference is not large enough to sacrifice model’s interpretability, or negligible in the context of a performed task.

Before interpreting the results, ensure that a sufficient number of repetitions (repeat_cv) has been used. Parameter repeat_cv should be at least 5, but for a heterogeneous data 20-50 may be needed. Otherwise, the results may unreliable and subject to chance.

Why the CoxPH-SRF ensemble and not just SRF?

First, you can use the package to compare the performances of the CoxPH and SRF themselves.

Second, the ensembles aim to single out the predictive value of the non-linearity and other data relationships that could not be captured by the baseline models. In both ensembles, the final models has a direct access to the predictions of the baseline CoxPH, and hence, the outperformance can be fully attributed to such complex relationships.

For example, the sequential ensemble of Cox and SRF takes the predictions of the Cox model and adds to the list of predictors to train SRF. This way, we make sure that linearity is captured by SRF at least as good as in the Cox model, and hence the marginal outperformance of the ensemble over the Cox model can be fully attributed to the qualities of SRF that Cox does not have, that is, data complexity.

Package installation

# For the main version:

# install.packages("devtools")
# devtools::install_github("dianashams/survcompare")

# DeepHit Extension:
# devtools::install_github("dianashams/survcompare", ref = "DeepHit")
library(survcompare)
#> Loading required package: survival

Examples

Example 1. Linear data

The first example takes simulated data that does not contain any complex terms, and CoxPH is expected to perform as good as Survival Random Forest.


mydata <- simulate_linear(200)
mypredictors <- names(mydata)[1:4]
survcompare(mydata, mypredictors, randomseed = 100)
#> [1] "Cross-validating CoxPH using 2 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 2"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 2 / 2"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Time difference of 1.296 secs
#> [1] "Cross-validating Survival Random Forest using 2 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 2"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 2 / 2"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Time difference of 8.369 secs
#> 
#> Internally validated test performance of CoxPH and Survival Random Forest over 2 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#>                            T C_score  AUCROC Calib_slope  sec
#> CoxPH                  8.239  0.7695  0.8150      1.0244 1.30
#> Survival Random Forest 8.239  0.7147  0.7271      0.8964 8.37
#> Diff                   0.000 -0.0547 -0.0879     -0.1280 7.07
#> pvalue                   NaN  0.9803  0.9919      0.7456  NaN
#> 
#> Median performance:
#>                            T C_score  AUCROC Calib_slope  sec
#> CoxPH                  8.239  0.7763  0.8465      1.0337 1.30
#> Survival Random Forest 8.239  0.7285  0.7274      0.8693 8.37
#> Diff                   0.000 -0.0478 -0.1191     -0.1645 7.07
#> pvalue                   NaN  0.9803  0.9919      0.7456  NaN
#> 
#> Survival Random Forest has NOT outperformed CoxPH with the mean c-index difference of -0.0547.
#> The difference is not statistically significant with the p-value = 0.9803. 
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Survival Random Forest.
#> Mean C-score: 
#>   CoxPH  0.7695(95CI=0.7625-0.7765;SD=0.0104)
#>   Survival Random Forest 0.7147(95CI=0.7129-0.7165;SD=0.0027)
#> Mean AUCROC:
#>   CoxPH  0.815(95CI=0.8091-0.8208;SD=0.0088)
#> Survival Random Forest 0.7271(95CI=0.7245-0.7297;SD=0.0038)
#> 
#> Internally validated test performance of CoxPH and Survival Random Forest over 2 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#>                            T C_score  AUCROC Calib_slope  sec
#> CoxPH                  8.239  0.7695  0.8150      1.0244 1.30
#> Survival Random Forest 8.239  0.7147  0.7271      0.8964 8.37
#> Diff                   0.000 -0.0547 -0.0879     -0.1280 7.07
#> pvalue                   NaN  0.9803  0.9919      0.7456  NaN
#> 
#> Median performance:
#>                            T C_score  AUCROC Calib_slope  sec
#> CoxPH                  8.239  0.7763  0.8465      1.0337 1.30
#> Survival Random Forest 8.239  0.7285  0.7274      0.8693 8.37
#> Diff                   0.000 -0.0478 -0.1191     -0.1645 7.07
#> pvalue                   NaN  0.9803  0.9919      0.7456  NaN
#> 
#> Survival Random Forest has NOT outperformed CoxPH with the mean c-index difference of -0.0547.
#> The difference is not statistically significant with the p-value = 0.9803. 
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Survival Random Forest.
#> Mean C-score: 
#>   CoxPH  0.7695(95CI=0.7625-0.7765;SD=0.0104)
#>   Survival Random Forest 0.7147(95CI=0.7129-0.7165;SD=0.0027)
#> Mean AUCROC:
#>   CoxPH  0.815(95CI=0.8091-0.8208;SD=0.0088)
#> Survival Random Forest 0.7271(95CI=0.7245-0.7297;SD=0.0038)

NOTE: the same results can be obtained by cross-validating CoxPH and SRF Ensemble separately,and then using survcompare2() function; outer_cv, inner_cv, repeat_cv, and randomseed should be the same.

#cross-validate CoxPH
 cv1<- survcox_cv(mydata, mypredictors, randomseed = 100)
#cross-validate Survival Random Forests ensemble
 cv2<- survsrfens_cv(mydata, mypredictors, randomseed = 100)
#compare the models 
 survcompare2(cv1, cv2)

Example 2. Non-linear data with interaction terms

The second example takes simulated data that contains non-linear and cross-terms, and an outperformance of the tree model is expected. We will increase the default number of cross-validation repetitions to get a robust estimate, choose CoxLasso as our baseline linear model, and SRF as the alternative (not the ensemble).


mydata2 <- simulate_crossterms(200)
mypredictors2 <- names(mydata2)[1:4]

#cross-validate CoxPH
cv1 <- survcox_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3, useCoxLasso = TRUE)
#> [1] "Cross-validating CoxLasso using 3 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 2 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 3 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Time difference of 0.745 secs
# cross-validate Survival Random Forests ensemble
cv2 <- survsrf_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3)
#> [1] "Cross-validating Survival Random Forest using 3 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 2 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 3 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Time difference of 12.93 secs
# compare the models 
compare_nonl <- survcompare2(cv1, cv2)
compare_nonl
#> 
#> Internally validated test performance of CoxLasso and Survival Random Forest over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#>                            T C_score AUCROC Calib_slope   sec
#> CoxLasso               8.315  0.6029 0.5795      1.4382  0.75
#> Survival Random Forest 8.315  0.6270 0.6145      0.4500 12.93
#> Diff                   0.000  0.0242 0.0350     -0.9882 12.18
#> pvalue                   NaN  0.1725 0.1559      0.9733   NaN
#> 
#> Median performance:
#>                            T C_score AUCROC Calib_slope   sec
#> CoxLasso               8.315  0.6165 0.5850      1.1308  0.75
#> Survival Random Forest 8.315  0.6372 0.6137      0.3707 12.93
#> Diff                   0.000  0.0207 0.0286     -0.7602 12.18
#> pvalue                   NaN  0.1725 0.1559      0.9733   NaN
#> 
#> Survival Random Forest has NOT outperformed CoxLasso with the mean c-index difference of 0.0242.
#> The difference is not statistically significant with the p-value = 0.1725. 
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Survival Random Forest.
#> Mean C-score: 
#>   CoxLasso  0.6029(95CI=0.5767-0.6439;SD=0.0385)
#>   Survival Random Forest 0.627(95CI=0.6016-0.6502;SD=0.0257)
#> Mean AUCROC:
#>   CoxLasso  0.5795(95CI=0.5492-0.6268;SD=0.0444)
#> Survival Random Forest 0.6145(95CI=0.6039-0.6333;SD=0.0177)

Detailed results can be extracted from the output object. For example, test performance for each data split (across all cross-validations and repetitions).

# Main stats 
compare_nonl$main_stats_pooled
#>                                  mean      sd 95CILow 95CIHigh
#> C_score_CoxLasso               0.6029 0.03847  0.5767   0.6439
#> C_score_Survival Random Forest 0.6270 0.02570  0.6016   0.6502
#> AUCROC_CoxLasso                0.5795 0.04436  0.5492   0.6268
#> AUCROC_Survival Random Forest  0.6145 0.01765  0.6039   0.6333

# More information, including Brier Scores, Calibration slope and alphas can be seen in the
# compare_models_1$results_mean. These are averaged performance metrics on the test sets.
compare_nonl$results_mean
#>                            T C_score AUCROC       BS BS_scaled Calib_slope
#> CoxLasso               8.315 0.60288 0.5795 0.176828  -0.01175      1.4382
#> Survival Random Forest 8.315 0.62704 0.6145 0.183562  -0.04883      0.4500
#> Diff                   0.000 0.02416 0.0350 0.006734  -0.03708     -0.9882
#> pvalue                   NaN 0.17254 0.1559       NA   0.90455      0.9733
#>                        Calib_alpha   sec
#> CoxLasso                   -0.1906  0.75
#> Survival Random Forest     -0.2034 12.93
#> Diff                       -0.0128 12.18
#> pvalue                      0.5817   NaN

We can also evaluate the stacked ensemble of Cox-PH and SRF and see the tuned lambda and SRF model parameters:

#cross-validate CoxPH
cv3<- survsrfstack_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3)
#> [1] "Cross-validating Stacked_SRF_CoxPH using 3 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 2 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> [1] "Repeated CV 3 / 3"
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Time difference of 16.3 secs
 
survcompare2(cv1, cv3)
#> 
#> Internally validated test performance of CoxLasso and Stacked_SRF_CoxPH over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#>                       T C_score AUCROC Calib_slope   sec
#> CoxLasso          8.315  0.6029 0.5795      1.4382  0.75
#> Stacked_SRF_CoxPH 8.315  0.6159 0.5995      0.6466 16.30
#> Diff              0.000  0.0131 0.0201     -0.7915 15.55
#> pvalue              NaN  0.2673 0.2278      0.9834   NaN
#> 
#> Median performance:
#>                       T C_score AUCROC Calib_slope   sec
#> CoxLasso          8.315  0.6165 0.5850      1.1308  0.75
#> Stacked_SRF_CoxPH 8.315  0.6160 0.6137      0.4258 16.30
#> Diff              0.000 -0.0004 0.0286     -0.7050 15.55
#> pvalue              NaN  0.2673 0.2278      0.9834   NaN
#> 
#> Stacked_SRF_CoxPH has NOT outperformed CoxLasso with the mean c-index difference of 0.0131.
#> The difference is not statistically significant with the p-value = 0.2673. 
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Stacked_SRF_CoxPH.
#> Mean C-score: 
#>   CoxLasso  0.6029(95CI=0.5767-0.6439;SD=0.0385)
#>   Stacked_SRF_CoxPH 0.6159(95CI=0.6004-0.628;SD=0.015)
#> Mean AUCROC:
#>   CoxLasso  0.5795(95CI=0.5492-0.6268;SD=0.0444)
#> Stacked_SRF_CoxPH 0.5995(95CI=0.5862-0.6103;SD=0.013)

Display all tuned model params. Lambda shows relative share of the SRF in the linear meta-learner. Lambda = 0 means that the model only uses CoxPH. Lambda = 1 means that only SRF is used.

# all tuned params
cv3$bestparams
#>   mtry nodesize nodedepth allmeans lambda c_score lambda_worst c_score_worst
#> 6    5       25        50    0.674   0.46   0.519            1        0.5007
#> 7    2       25        15   0.6252   0.93  0.5254          0.1         0.506
#> 1    5       15         2    0.553   0.54  0.5873            0        0.5487
#> 3    5       15         2   0.6138      1  0.6078            0        0.5572
#> 9    2       25        15   0.6044   0.99  0.5532         0.03        0.5068
#> 4    3       25        10   0.6767   0.54  0.6895            0        0.5813
#> 2    4       20         7   0.6294   0.97  0.5543            0        0.4651
#> 5    3       20        10   0.5527      0  0.5247            1        0.4851
#> 8    4       20         7   0.6575   0.76  0.6707            0         0.586
#>   C_score_outer
#> 6        0.7398
#> 7        0.6794
#> 1        0.6422
#> 3        0.6407
#> 9        0.6160
#> 4        0.5952
#> 2        0.5771
#> 5        0.5505
#> 8        0.5026

lambdas <- unlist(cv3$bestparams$lambda)
print("Mean lambda in the stacked ensemble of the SRF and CoxPH, indicating the share of SRF predictions in the meta-learner:") 
#> [1] "Mean lambda in the stacked ensemble of the SRF and CoxPH, indicating the share of SRF predictions in the meta-learner:"
print(mean(lambdas))
#> [1] 0.6878

plot(lambdas)
abline(h=mean(lambdas), col = "blue")

Example 3. Applying `survcompare` to GBSG data

Now, lets apply the package to a real life data. We will use GBSG data from the survival package (https://rdrr.io/cran/survival/man/gbsg.html).

library(survival)

#prepare the data
mygbsg <- gbsg
mygbsg$time <- gbsg$rfstime / 365
mygbsg$event <- gbsg$status
myfactors <-
  c("age", "meno", "size", "grade", "nodes", "pgr", "er", "hormon")
mygbsg <- mygbsg[c("time", "event", myfactors)]
sum(is.na(mygbsg[myfactors])) #check if any missing data

# run survcompare 
cv1<- survcox_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3)
cv2<- survsrfens_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3)
survcompare2(cv1, cv2)

We got the following results (depending on a random seed, it may slightly differ):

Internally validated test performance of CoxPH and SRF_ensemble over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:

Median performance:

SRF_ensemble has outperformed CoxPHby 0.0137 in C-index.
The difference is statistically significant with the p-value 0.012*.
The supplied data may contain non-linear or cross-term dependencies, 
better captured by SRF_ensemble.
Mean C-score: 
  CoxPH  0.6773(95CI=0.675-0.6801;SD=0.0028)
  SRF_ensemble 0.6911(95CI=0.6902-0.6918;SD=9e-04)
Mean AUCROC:
  CoxPH  0.7224(95CI=0.7176-0.728;SD=0.0055)
SRF_ensemble 0.7211(95CI=0.7177-0.7241;SD=0.0034)

This example illustrates a situation when outperformance of the non-linear model is statistically significant, but not large in absolute terms. The ultimate model choice will depend on how critical such an improvement is for the model stakeholders.