The package checks whether there are considerable non-linear and interaction terms in the data, and quantifies their contributions to the models’ performance. Using repeated nested cross-validation (that is, a system of random data splits into the training and testing sets to evaluate prediction accuracy), the package:
Validates Cox Proportionate Hazards model, or Cox Lasso depending on the user’s input. This step employs
survival::coxph()
(Therneau, 2023);glmnet::glmnet(..., family="cox")
(Simon, Friedman,
Hastie & Tibshirani , 2011).Validates Survival Random Forest ensembled with the baseline Cox model. To fit the ensemble, CoxPH’s out-of-sample predictions are added to the list of orignial predictors, which are then used to fit SRF, using
randomForestSRC::rfsrc()
(Ishwaran & Kogalur,
2023).Validates stacked ensemble of the CoxPH and Survival Random Forest, which predicts as a linear combination of the CoxPH and SRF, (1-lambda) x CoxPH + lambda x SRF. Lambda parameter is tuned within the package and the value can indicate if taking a share of SRF predictions could improve the baseline CoxPH’s performance. Lambda = 0 means only CoxPH is used, lambda = 1 means the model is the same as SRF.
GitHit only: deep learning model DeepHit ‘survivalmodels::deephit()’, as well as its sequential and stacked ensembles with CoxPH or Cox-Lasso.
Performs statistical testing of whether the Survival Random Forest [ensemble] outperforms the Cox model.
Does NOT handle missing data at the moment.
Predictive performance is evaluated by averaging different measures across all train-test splits. The following measures are computed:
Discrimination measures: Harrell’s concordance index, time-dependent AUCROC.
Calibration measures: calibration slope, calibration alpha.
Overall fit: Brier score, Scaled Brier score.
Performance metrics’ description and definitions can be found, for example, in Steyerberg & Vergouwe (2014).
Therneau T (2023). A Package for Survival Analysis in R. R package version 3.5-7, https://CRAN.R-project.org/package=survival.
Simon N, Friedman J, Tibshirani R, Hastie T (2011). “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software, 39(5), 1–13. doi:10.18637/jss.v039.i05.
Ishwaran H, Kogalur U (2023). Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 3.2.2, https://cran.r-project.org/package=randomForestSRC
Steyerberg EW, Vergouwe Y. (2014). Towards better clinical prediction models: seven steps for development and an ABCD for validation. European heart journal, 35(29), 1925-1931 https://doi.org/10.1093/eurheartj/ehu207
survcompare
results?There are two possible outcomes: 1) “Survival Random Forest ensemble has NOT outperformed CoxPH”, or 2) “Survival Random Forest ensemble has outperformed CoxPH by … in C-index”.
If there is no outperformance, the test can justify the employment of CoxPH model and indicate a negligible advantage of using a more flexible model such as Survival Random Forest.
In the case of the ensemble’s outperformance of the Cox model, a researcher can:
Before interpreting the results, ensure that a sufficient number of repetitions (repeat_cv) has been used. Parameter repeat_cv should be at least 5, but for a heterogeneous data 20-50 may be needed. Otherwise, the results may unreliable and subject to chance.
First, you can use the package to compare the performances of the CoxPH and SRF themselves.
Second, the ensembles aim to single out the predictive value of the non-linearity and other data relationships that could not be captured by the baseline models. In both ensembles, the final models has a direct access to the predictions of the baseline CoxPH, and hence, the outperformance can be fully attributed to such complex relationships.
For example, the sequential ensemble of Cox and SRF takes the predictions of the Cox model and adds to the list of predictors to train SRF. This way, we make sure that linearity is captured by SRF at least as good as in the Cox model, and hence the marginal outperformance of the ensemble over the Cox model can be fully attributed to the qualities of SRF that Cox does not have, that is, data complexity.
The first example takes simulated data that does not contain any complex terms, and CoxPH is expected to perform as good as Survival Random Forest.
mydata <- simulate_linear(200)
mypredictors <- names(mydata)[1:4]
survcompare(mydata, mypredictors, randomseed = 100)
#> [1] "Cross-validating CoxPH using 2 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 2"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 2 / 2"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> Time difference of 1.296 secs
#> [1] "Cross-validating Survival Random Forest using 2 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 2"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 2 / 2"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> Time difference of 8.369 secs
#>
#> Internally validated test performance of CoxPH and Survival Random Forest over 2 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#> T C_score AUCROC Calib_slope sec
#> CoxPH 8.239 0.7695 0.8150 1.0244 1.30
#> Survival Random Forest 8.239 0.7147 0.7271 0.8964 8.37
#> Diff 0.000 -0.0547 -0.0879 -0.1280 7.07
#> pvalue NaN 0.9803 0.9919 0.7456 NaN
#>
#> Median performance:
#> T C_score AUCROC Calib_slope sec
#> CoxPH 8.239 0.7763 0.8465 1.0337 1.30
#> Survival Random Forest 8.239 0.7285 0.7274 0.8693 8.37
#> Diff 0.000 -0.0478 -0.1191 -0.1645 7.07
#> pvalue NaN 0.9803 0.9919 0.7456 NaN
#>
#> Survival Random Forest has NOT outperformed CoxPH with the mean c-index difference of -0.0547.
#> The difference is not statistically significant with the p-value = 0.9803.
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Survival Random Forest.
#> Mean C-score:
#> CoxPH 0.7695(95CI=0.7625-0.7765;SD=0.0104)
#> Survival Random Forest 0.7147(95CI=0.7129-0.7165;SD=0.0027)
#> Mean AUCROC:
#> CoxPH 0.815(95CI=0.8091-0.8208;SD=0.0088)
#> Survival Random Forest 0.7271(95CI=0.7245-0.7297;SD=0.0038)
#>
#> Internally validated test performance of CoxPH and Survival Random Forest over 2 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#> T C_score AUCROC Calib_slope sec
#> CoxPH 8.239 0.7695 0.8150 1.0244 1.30
#> Survival Random Forest 8.239 0.7147 0.7271 0.8964 8.37
#> Diff 0.000 -0.0547 -0.0879 -0.1280 7.07
#> pvalue NaN 0.9803 0.9919 0.7456 NaN
#>
#> Median performance:
#> T C_score AUCROC Calib_slope sec
#> CoxPH 8.239 0.7763 0.8465 1.0337 1.30
#> Survival Random Forest 8.239 0.7285 0.7274 0.8693 8.37
#> Diff 0.000 -0.0478 -0.1191 -0.1645 7.07
#> pvalue NaN 0.9803 0.9919 0.7456 NaN
#>
#> Survival Random Forest has NOT outperformed CoxPH with the mean c-index difference of -0.0547.
#> The difference is not statistically significant with the p-value = 0.9803.
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Survival Random Forest.
#> Mean C-score:
#> CoxPH 0.7695(95CI=0.7625-0.7765;SD=0.0104)
#> Survival Random Forest 0.7147(95CI=0.7129-0.7165;SD=0.0027)
#> Mean AUCROC:
#> CoxPH 0.815(95CI=0.8091-0.8208;SD=0.0088)
#> Survival Random Forest 0.7271(95CI=0.7245-0.7297;SD=0.0038)
NOTE: the same results can be obtained by cross-validating CoxPH and SRF Ensemble separately,and then using survcompare2() function; outer_cv, inner_cv, repeat_cv, and randomseed should be the same.
The second example takes simulated data that contains non-linear and cross-terms, and an outperformance of the tree model is expected. We will increase the default number of cross-validation repetitions to get a robust estimate, choose CoxLasso as our baseline linear model, and SRF as the alternative (not the ensemble).
mydata2 <- simulate_crossterms(200)
mypredictors2 <- names(mydata2)[1:4]
#cross-validate CoxPH
cv1 <- survcox_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3, useCoxLasso = TRUE)
#> [1] "Cross-validating CoxLasso using 3 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 2 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 3 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> Time difference of 0.745 secs
# cross-validate Survival Random Forests ensemble
cv2 <- survsrf_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3)
#> [1] "Cross-validating Survival Random Forest using 3 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 2 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 3 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> Time difference of 12.93 secs
# compare the models
compare_nonl <- survcompare2(cv1, cv2)
compare_nonl
#>
#> Internally validated test performance of CoxLasso and Survival Random Forest over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#> T C_score AUCROC Calib_slope sec
#> CoxLasso 8.315 0.6029 0.5795 1.4382 0.75
#> Survival Random Forest 8.315 0.6270 0.6145 0.4500 12.93
#> Diff 0.000 0.0242 0.0350 -0.9882 12.18
#> pvalue NaN 0.1725 0.1559 0.9733 NaN
#>
#> Median performance:
#> T C_score AUCROC Calib_slope sec
#> CoxLasso 8.315 0.6165 0.5850 1.1308 0.75
#> Survival Random Forest 8.315 0.6372 0.6137 0.3707 12.93
#> Diff 0.000 0.0207 0.0286 -0.7602 12.18
#> pvalue NaN 0.1725 0.1559 0.9733 NaN
#>
#> Survival Random Forest has NOT outperformed CoxLasso with the mean c-index difference of 0.0242.
#> The difference is not statistically significant with the p-value = 0.1725.
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Survival Random Forest.
#> Mean C-score:
#> CoxLasso 0.6029(95CI=0.5767-0.6439;SD=0.0385)
#> Survival Random Forest 0.627(95CI=0.6016-0.6502;SD=0.0257)
#> Mean AUCROC:
#> CoxLasso 0.5795(95CI=0.5492-0.6268;SD=0.0444)
#> Survival Random Forest 0.6145(95CI=0.6039-0.6333;SD=0.0177)
Detailed results can be extracted from the output object. For example, test performance for each data split (across all cross-validations and repetitions).
# Main stats
compare_nonl$main_stats_pooled
#> mean sd 95CILow 95CIHigh
#> C_score_CoxLasso 0.6029 0.03847 0.5767 0.6439
#> C_score_Survival Random Forest 0.6270 0.02570 0.6016 0.6502
#> AUCROC_CoxLasso 0.5795 0.04436 0.5492 0.6268
#> AUCROC_Survival Random Forest 0.6145 0.01765 0.6039 0.6333
# More information, including Brier Scores, Calibration slope and alphas can be seen in the
# compare_models_1$results_mean. These are averaged performance metrics on the test sets.
compare_nonl$results_mean
#> T C_score AUCROC BS BS_scaled Calib_slope
#> CoxLasso 8.315 0.60288 0.5795 0.176828 -0.01175 1.4382
#> Survival Random Forest 8.315 0.62704 0.6145 0.183562 -0.04883 0.4500
#> Diff 0.000 0.02416 0.0350 0.006734 -0.03708 -0.9882
#> pvalue NaN 0.17254 0.1559 NA 0.90455 0.9733
#> Calib_alpha sec
#> CoxLasso -0.1906 0.75
#> Survival Random Forest -0.2034 12.93
#> Diff -0.0128 12.18
#> pvalue 0.5817 NaN
We can also evaluate the stacked ensemble of Cox-PH and SRF and see the tuned lambda and SRF model parameters:
#cross-validate CoxPH
cv3<- survsrfstack_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3)
#> [1] "Cross-validating Stacked_SRF_CoxPH using 3 repeat(s), 3 outer, 3 inner loops)."
#> [1] "Repeated CV 1 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 2 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> [1] "Repeated CV 3 / 3"
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> Time difference of 16.3 secs
survcompare2(cv1, cv3)
#>
#> Internally validated test performance of CoxLasso and Stacked_SRF_CoxPH over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
#> T C_score AUCROC Calib_slope sec
#> CoxLasso 8.315 0.6029 0.5795 1.4382 0.75
#> Stacked_SRF_CoxPH 8.315 0.6159 0.5995 0.6466 16.30
#> Diff 0.000 0.0131 0.0201 -0.7915 15.55
#> pvalue NaN 0.2673 0.2278 0.9834 NaN
#>
#> Median performance:
#> T C_score AUCROC Calib_slope sec
#> CoxLasso 8.315 0.6165 0.5850 1.1308 0.75
#> Stacked_SRF_CoxPH 8.315 0.6160 0.6137 0.4258 16.30
#> Diff 0.000 -0.0004 0.0286 -0.7050 15.55
#> pvalue NaN 0.2673 0.2278 0.9834 NaN
#>
#> Stacked_SRF_CoxPH has NOT outperformed CoxLasso with the mean c-index difference of 0.0131.
#> The difference is not statistically significant with the p-value = 0.2673.
#> The data may NOT contain considerable non-linear or cross-term dependencies
#> that could be captured by Stacked_SRF_CoxPH.
#> Mean C-score:
#> CoxLasso 0.6029(95CI=0.5767-0.6439;SD=0.0385)
#> Stacked_SRF_CoxPH 0.6159(95CI=0.6004-0.628;SD=0.015)
#> Mean AUCROC:
#> CoxLasso 0.5795(95CI=0.5492-0.6268;SD=0.0444)
#> Stacked_SRF_CoxPH 0.5995(95CI=0.5862-0.6103;SD=0.013)
Display all tuned model params. Lambda shows relative share of the SRF in the linear meta-learner. Lambda = 0 means that the model only uses CoxPH. Lambda = 1 means that only SRF is used.
# all tuned params
cv3$bestparams
#> mtry nodesize nodedepth allmeans lambda c_score lambda_worst c_score_worst
#> 6 5 25 50 0.674 0.46 0.519 1 0.5007
#> 7 2 25 15 0.6252 0.93 0.5254 0.1 0.506
#> 1 5 15 2 0.553 0.54 0.5873 0 0.5487
#> 3 5 15 2 0.6138 1 0.6078 0 0.5572
#> 9 2 25 15 0.6044 0.99 0.5532 0.03 0.5068
#> 4 3 25 10 0.6767 0.54 0.6895 0 0.5813
#> 2 4 20 7 0.6294 0.97 0.5543 0 0.4651
#> 5 3 20 10 0.5527 0 0.5247 1 0.4851
#> 8 4 20 7 0.6575 0.76 0.6707 0 0.586
#> C_score_outer
#> 6 0.7398
#> 7 0.6794
#> 1 0.6422
#> 3 0.6407
#> 9 0.6160
#> 4 0.5952
#> 2 0.5771
#> 5 0.5505
#> 8 0.5026
lambdas <- unlist(cv3$bestparams$lambda)
print("Mean lambda in the stacked ensemble of the SRF and CoxPH, indicating the share of SRF predictions in the meta-learner:")
#> [1] "Mean lambda in the stacked ensemble of the SRF and CoxPH, indicating the share of SRF predictions in the meta-learner:"
print(mean(lambdas))
#> [1] 0.6878
plot(lambdas)
abline(h=mean(lambdas), col = "blue")
survcompare
to GBSG dataNow, lets apply the package to a real life data. We will use GBSG data from the survival package (https://rdrr.io/cran/survival/man/gbsg.html).
library(survival)
#prepare the data
mygbsg <- gbsg
mygbsg$time <- gbsg$rfstime / 365
mygbsg$event <- gbsg$status
myfactors <-
c("age", "meno", "size", "grade", "nodes", "pgr", "er", "hormon")
mygbsg <- mygbsg[c("time", "event", myfactors)]
sum(is.na(mygbsg[myfactors])) #check if any missing data
# run survcompare
cv1<- survcox_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3)
cv2<- survsrfens_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3)
survcompare2(cv1, cv2)
We got the following results (depending on a random seed, it may slightly differ):
Internally validated test performance of CoxPH and SRF_ensemble over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:
Median performance:
SRF_ensemble has outperformed CoxPHby 0.0137 in C-index.
The difference is statistically significant with the p-value 0.012*.
The supplied data may contain non-linear or cross-term dependencies,
better captured by SRF_ensemble.
Mean C-score:
CoxPH 0.6773(95CI=0.675-0.6801;SD=0.0028)
SRF_ensemble 0.6911(95CI=0.6902-0.6918;SD=9e-04)
Mean AUCROC:
CoxPH 0.7224(95CI=0.7176-0.728;SD=0.0055)
SRF_ensemble 0.7211(95CI=0.7177-0.7241;SD=0.0034)
This example illustrates a situation when outperformance of the non-linear model is statistically significant, but not large in absolute terms. The ultimate model choice will depend on how critical such an improvement is for the model stakeholders.