if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("curatedTCGAData")
Load packages:
library(curatedTCGAData)
library(MultiAssayExperiment)
library(TCGAutils)
Checking available cancer codes and assays in TCGA data:
curatedTCGAData(diseaseCode = "*", assays = "*", dry.run = TRUE)
## Please see the list below for available cohorts and assays
## Available Cancer codes:
## ACC BLCA BRCA CESC CHOL COAD DLBC ESCA GBM HNSC KICH
## KIRC KIRP LAML LGG LIHC LUAD LUSC MESO OV PAAD PCPG
## PRAD READ SARC SKCM STAD TGCT THCA THYM UCEC UCS UVM
## Available Data Types:
## CNACGH CNACGH_CGH_hg_244a
## CNACGH_CGH_hg_415k_g4124a CNASNP CNASeq
## CNVSNP GISTIC_AllByGene GISTIC_Peaks
## GISTIC_ThresholdedByGene Methylation
## Methylation_methyl27 Methylation_methyl450
## Mutation RNASeq2GeneNorm RNASeqGene RPPAArray
## mRNAArray mRNAArray_TX_g4502a
## mRNAArray_TX_g4502a_1
## mRNAArray_TX_ht_hg_u133a mRNAArray_huex
## miRNAArray miRNASeqGene
Check potential files to be downloaded:
curatedTCGAData(diseaseCode = "COAD", assays = "RPPA*", dry.run = TRUE)
## Title DispatchClass
## 96 COAD_RPPAArray-20160128 Rda
(accmae <- curatedTCGAData("ACC", c("CN*", "Mutation"), FALSE))
## A MultiAssayExperiment object of 3 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 3:
## [1] ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 180 columns
## [2] ACC_CNVSNP-20160128: RaggedExperiment with 21052 rows and 180 columns
## [3] ACC_Mutation-20160128: RaggedExperiment with 20166 rows and 90 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
Note. For more on how to use a MultiAssayExperiment
please see the
MultiAssayExperiment
vignette.
Some cancer datasets contain associated subtype information within the
clinical datasets provided. This subtype information is included in the
metadata of colData
of the MultiAssayExperiment
object. To obtain these
variable names, use the getSubtypeMap
function from TCGA utils:
head(getSubtypeMap(accmae))
## ACC_annotations ACC_subtype
## 1 Patient_ID patientID
## 2 histological_subtypes Histology
## 3 mrna_subtypes C1A/C1B
## 4 mrna_subtypes mRNA_K4
## 5 cimp MethyLevel
## 6 microrna_subtypes miRNA cluster
Another helper function provided by TCGAutils allows users to obtain a set
of consistent clinical variable names across several cancer types.
Use the getClinicalNames
function to obtain a character vector of common
clinical variables such as vital status, years to birth, days to death, etc.
head(getClinicalNames("ACC"))
## [1] "years_to_birth" "vital_status" "days_to_death"
## [4] "days_to_last_followup" "tumor_tissue_site" "pathologic_stage"
colData(accmae)[, getClinicalNames("ACC")][1:5, 1:5]
## DataFrame with 5 rows and 5 columns
## years_to_birth vital_status days_to_death days_to_last_followup
## <integer> <integer> <integer> <integer>
## TCGA-OR-A5J1 58 1 1355 NA
## TCGA-OR-A5J2 44 1 1677 NA
## TCGA-OR-A5J3 23 0 NA 2091
## TCGA-OR-A5J4 23 1 423 NA
## TCGA-OR-A5J5 30 1 365 NA
## tumor_tissue_site
## <character>
## TCGA-OR-A5J1 adrenal
## TCGA-OR-A5J2 adrenal
## TCGA-OR-A5J3 adrenal
## TCGA-OR-A5J4 adrenal
## TCGA-OR-A5J5 adrenal
The sampleTables
function gives an overview of sample types / codes
present in the data:
sampleTables(accmae)
## $`ACC_CNASNP-20160128`
##
## 01 10 11
## 90 85 5
##
## $`ACC_CNVSNP-20160128`
##
## 01 10 11
## 90 85 5
##
## $`ACC_Mutation-20160128`
##
## 01
## 90
Often, an analysis is performed comparing two groups of samples to each other.
To facilitate the separation of samples, the splitAssays
TCGAutils function
identifies all sample types in the assays and moves each into its own
assay. By default, all discoverable sample types are separated into
a separate experiment. In this case we requested only solid tumors and blood
derived normal samples as seen in the sampleTypes
reference dataset:
sampleTypes[sampleTypes[["Code"]] %in% c("01", "10"), ]
## Code Definition Short.Letter.Code
## 1 01 Primary Solid Tumor TP
## 10 10 Blood Derived Normal NB
splitAssays(accmae, c("01", "10"))
## Warning: Some 'sampleCodes' not found in assays
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] 01_ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 90 columns
## [2] 10_ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 85 columns
## [3] 01_ACC_CNVSNP-20160128: RaggedExperiment with 21052 rows and 90 columns
## [4] 10_ACC_CNVSNP-20160128: RaggedExperiment with 21052 rows and 85 columns
## [5] 01_ACC_Mutation-20160128: RaggedExperiment with 20166 rows and 90 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices