If you use CrossICC
in your published research, please cite this paper:
Unsupervised clustering of high-throughput molecular profiling data is widely adopted for discovering cancer subtypes. However, cancer subtypes derived from a single dataset are not usually applicable across multiple datasets from different platforms. We previously published an iterative clustering algorithm to address the issue (see this paper), but its use was hampered due to lack of implementation.
In this project, we present CrossICC to implement this method. Moreover, many new features were added to improve the performance of the algorithm. Briefly, CrossICC utilizes an iterative strategy to derive the optimal gene set and cluster number from consensus similarity matrix generated by consensus clustering. CrossICC is able to deal with multiple cross platform datasets so that requires no between-dataset normalizations. This package also provides abundant functions to help users visualize the identified subtypes and evaluate the subtyping performance. Specially, many cancer-related analysis methods are embedded to facilitate the clinical translation of the identified cancer subtypes.
To install via Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("CrossICC")
The development version is also available to download from Github.
Most of the tools for clustering require users to combine all of dataset, while CrossICC only needs a list
object in R. We also provide a function CrossICCInput
for importing multiple files as a list
.
NOTE: CrossICCInput() internally call
data.table::fread()
, so you never need specify a separator.
CrossICC is easy enough for using by just calling the function with default parameters. You can run function predictor()
to calculate the correlation between the predictor centroid and the validation centroid and you can also get GSEA-like ranked matrix from CrossICC result by running function of ssGSEA()
. We also provide a graphical interface which can help users to check the result of CrossICC in a very intuitive way.
To run CrossICC:
library(CrossICC)
data(demo.platforms)
# Turn on use.shiny parameter if you want to call shiny once the CrossICC finished
CrossICC.object <- CrossICC(demo.platforms, skip.mfs = TRUE, use.shiny = FALSE, overwrite = TRUE, output.dir = tempdir())
## Merging, filtering and scaling are skipped.
## No study names provided or something goes wrong with your study names. Will use auto-generated study names instead.
## Tue Oct 29 23:50:27 2019 -- start iteration: 1
## 352 genes were engaged in this iteration.
## Tue Oct 29 23:50:39 2019 -- start iteration: 2
## 349 genes were engaged in this iteration.
## Tue Oct 29 23:50:51 2019 -- start iteration: 3
## 346 genes were engaged in this iteration.
## Tue Oct 29 23:51:02 2019 -- start iteration: 4
## 345 genes were engaged in this iteration.
## Tue Oct 29 23:51:12 2019 -- start iteration: 5
## 345 genes were engaged in this iteration.
## A CrossICC.object.rds file will be generated in home directory by default.
## Note that the previous file will be overridden.
## Tue Oct 29 23:51:23 2019 -- Iteration finished! Iteration time for reaching convergence/limit: 4
CrossICC will generate an .rds
formatted object in your home path (~/
, a.k.a $HOME
in Linux), which records key features (genes), iteration times and other information during analysis in a compressed file format. You can call shiny app with this file later.
To compare samples according to their pathway information, we provide a way to get GSEA-like ranked eigenvalue matrix:
Mcluster <- paste("K", CrossICC.object$clusters$clusters[[1]], sep = "")
CrossICC.ssGSEA <- ssGSEA(x = demo.platforms[[1]], gene.signature = CrossICC.object$gene.signature, geneset2gene = CrossICC.object$unioned.genesets, cluster = Mcluster)
And you can use CrossICC’s result as model for clustering new samples, simply by calculate the correlation between the predictor centroid and the validation centroid:
## Tue Oct 29 23:51:23 2019 -- Merging multiple probes for one feature
## Tue Oct 29 23:51:23 2019 -- Removing features with no variance
## Tue Oct 29 23:51:23 2019 -- Scaling
new.exprs
is your expression matrix, and CrossICC’s result CrossICC.obj
can be used as model. The process will take a few minutes.
If you have issues/questions, please visit CrossICC homepage(https://github.com/bioinformatist/CrossICC) first. If you think you have found a bug, please provide a reproducible example to be posted on github issue tracker.
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] CrossICC_1.0.0 MASS_7.3-51.4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 cluster_2.1.0
## [3] knitr_1.25 magrittr_1.5
## [5] ConsensusClusterPlus_1.50.0 BiocGenerics_0.32.0
## [7] tidyselect_0.2.5 lattice_0.20-38
## [9] R6_2.4.0 rlang_0.4.1
## [11] stringr_1.4.0 dplyr_0.8.3
## [13] tools_3.6.1 parallel_3.6.1
## [15] grid_3.6.1 Biobase_2.46.0
## [17] data.table_1.12.6 xfun_0.10
## [19] htmltools_0.4.0 yaml_2.2.0
## [21] digest_0.6.22 assertthat_0.2.1
## [23] tibble_2.1.3 crayon_1.3.4
## [25] Matrix_1.2-17 purrr_0.3.3
## [27] MergeMaid_2.58.0 glue_1.3.1
## [29] evaluate_0.14 rmarkdown_1.16
## [31] limma_3.42.0 stringi_1.4.3
## [33] pillar_1.4.2 compiler_3.6.1
## [35] pkgconfig_2.0.3