cytofast 1.0.0
Cytofast is a R-package aimed at quickly visualizing and analyzing the output of Cytosplore; software used to characterizes and cluster immune subsets from flow or mass cytometry (CyTOF) data. Cytosplore is mainly used for its fine implementation of t-sne (and HSNE) algorithm to identify and cluster distinct populations of cells. However, further downstream analysis is unsupported, such as relating these population of cells to a phenotype and/or ascribing any experimental conditions. Therefore, we introduce the cytofast package, which provides a quick visualization and quantification of clusters (or combination of clusters) playing a key role in your study.
In this vignette will guide the users through the basic workflow designed around the package. As an example we will use the mass cytometry dataset generated by Spitzer (2017). Here we work with a small example to explain the functionality of the package, for the complete workfow and a more thorough analysis we refer to our peer reviewed paper (Beyrend et al. 2018).
The workflow is designed around several steps, which are addressed per section. In each section we showcase some functions, which may or may not be of your interest. A summary of the most used functions is as follows:
readCytosploreFCS
: reading .fcs files from Cytosplore into RcytoHeatmaps
: generating two aligned heatmapsmsiPlot
: generating histograms of the signal intensity for each marker, per group or clustercytoBoxPlots
: creates boxplots (percentage of total given cells) and representing each cluster as a percentage of the total analyzed immune cellscytottest
and cytogt
, functions to quantify clusters in respect to experimental conditions. The latter function
is still under development.To install cytofast run the following code in the console:
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install("cytofast")
Before cytofast can be applied, the data should be clustered. We recommend using Cytosplore for this task, however it is certainly possible to use any other clustering algorithm. As an example, we will also show how data clustered by FlowSOM can be parsed for usage with cytofast.
After downloading Cytosplore, hosted at www.cytosplore.org, upload the FCS files by clicking on File then Open FCS File(s). Be sure to add unique sample tag as channel when prompted and select a cofactor for hyperbolic arcsinh transformation (default is 5).
Click on Run H-SNE and wait for the map being generated. This step may require some time depending on the number of cells analyzed.
Once the tSNE map is generated, we can save the clusters defined by Cytosplore by right-clicking on the tSNE map and save the clusters as “Save Clusters …”“. We can choose the directory of the output files as prompted by Cytosplore; make sure to note this location, because we will use this directory to load the .fcs files into R.
We advise using a simple name (characters only) when renaming the output files, this makes identification and further handling easier. After saving, we will have a folder with the created .fcs files, with each file corresponding to your identified clusters in Cytosplore. The next step will be loading the files into R with the help of cytofast
.
Let’s start with loading the package.
library(cytofast)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
To follow this vignette we added some pre-clustered (by Cytosplore) .fcs files to the package. These are ten downsampled clusters from the Spitzer study and should be treated as a toy-example to discover the package. A complete analysis of the Spitzer data is reported and published in Beyrend et al., CSBJ, 2018.
To import the .fcs files created by Cytosplore, we use the function readCytosploreFCS
. Simply give the directory where you saved your .fcs files and the function will store the data in a cfList
(cytofast list). When not using Cytosplore, check the function cfList
to create such a list yourself. It is an S4 class object with several slots, a data frame expr
containing the expression of the markers for all measured cells, counts
a data frame with cell counts per cluster per sample and samples
a data frame with all the meta information.
## An object of class cfList
##
## @samples
## sampleID
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 10 rows and 1 columns
##
## @expr
## sampleID clusterID
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 10 rows and 2 columns
Note that the expr
slot needs both a column named “sampleID” and “clusterID”, which are used to identify the origin of each individual cell. When using readCytosploreFCS
this is all done automatically.
From here you can choose to use your own data, or it’s possible to follow along with the added data from Spitzer. So, let’s load the data into R:
dirFCS <- system.file("extdata", package="cytofast")
cfData <- readCytosploreFCS(dir = dirFCS, colNames = "description")
## Reading .FCS cluster: 1
## Reading .FCS cluster: 2
## Reading .FCS cluster: 3
## Reading .FCS cluster: 4
## Reading .FCS cluster: 5
## Reading .FCS cluster: 6
## Reading .FCS cluster: 7
## Reading .FCS cluster: 8
## Reading .FCS cluster: 9
## Reading .FCS cluster: 10
In total there were 10 .fcs files (also meaning 10 clusters) and are now stored into a cfList
. We can acces the list and have a quick look at the marker expression of the cells.
cfData@expr[1:5, 1:5]
## clusterID sampleID Time Event_length BC1
## 1 cytofastData1.fcs 16 6912180.0 36 581.915466
## 2 cytofastData1.fcs 20 778687.5 31 9.465774
## 3 cytofastData1.fcs 0 2145997.8 41 1.146241
## 4 cytofastData1.fcs 9 893060.8 45 14.375663
## 5 cytofastData1.fcs 15 7234341.0 34 347.519806
As shown, both clusterID and sampleID are added to the expression data. We also see that there are some ‘markers’ that we don’t want in our further analysis. Therefore, in the next section we will clean-up the data and add some meta information to the cfList
.
Some channels are not needed (e.g. DNA1, DNA2, barcoding, etc…). We can remove those and only keep the markers that we are actually intrested in.
cfData@expr <- cfData@expr[,-c(3:10, 13:16, 55:59, 61:63)]
Next we will add some meta data to the cfList
. Make sure that the sampleID corresponds with the sampleID of the meta data! When using Cytosplore, the sampleID is based on the sample tag (CSPLR_ST).
data(spitzer)
meta <- spitzer[match(row.names(cfData@samples), spitzer[,"CSPLR_ST"]),] # match sampleID
cfData@samples <- cbind(cfData@samples, meta[,2:3])
Lastly, we will make some small changes to the labeling of the data. This helps with the visualization of the heatmaps and discovering relations of clusters.
levels(cfData@expr[,"clusterID"]) <- gsub("[^0-9]", "", levels(cfData@expr[,"clusterID"]))
Before we can create the heatmaps, we first address the last slot counts
, which is still empty. To fill this slot we can call the function cellCounts
. The default function will simply add the raw cell counts per cluster per sample into the cfList.
cfData <- cellCounts(cfData)
head(cfData@counts)
In many cases the abundance of the cells could be biased in respect to the sample size of the donors, therefore it is better to look at the frequency of the clusters in respect to the total amount of cells per sample. We will also standardize the data (mean zero, unit variance) for a better comparison between clusters.
cfData <- cellCounts(cfData, frequency = TRUE, scale = TRUE)
head(cfData@counts)
## 1 10 2 3 4 5
## 0 0.2123060 2.1764833 -1.2802619 -1.1578036 -0.79710186 -0.508726298
## 1 0.3233091 0.8532170 0.2047896 -1.3588553 -0.93108986 -0.009465912
## 2 -1.0578183 -1.4106237 -0.1400055 -1.5386779 -0.02643354 -0.434072346
## 3 -0.6210715 -1.0437467 -1.0236987 -0.2207599 0.44452716 -0.229890594
## 4 -1.1419325 -0.6061463 -1.2022531 -1.1693452 -0.10806673 0.104783253
## 5 -1.1419325 -0.5380984 0.2152418 0.1136000 -0.28046313 -0.665685717
## 6 7 8 9
## 0 0.8491796 -1.1515787 -0.5534802 0.4825108
## 1 0.4231830 -0.9064829 -0.3232428 0.6899801
## 2 2.6414025 -1.1571346 -0.9826655 1.3671319
## 3 0.8322060 -0.9975814 -1.8546953 1.9479917
## 4 2.1375120 -0.9031520 -0.9826655 1.3671319
## 5 1.1849822 -0.7009684 -1.3517980 1.5746208
Now everything is ready to visualize the data. One of the main functions of this package is cytoHeatmaps
, which is used to visualize both the phenotype of the created clusters and their heterogeneity in respect to the samples.
cytoHeatmaps(cfData, group="group", legend=TRUE)
The function will plot two heatmaps:
+ on the upper panel the phenotype (based on the markers) of each cluster
+ on the lower panel the relative abundance of the clusters in respect to the samples
A hierarchical clustering was performed on subset frequencies using the Spearmans rank correlation, resulting in a dendogram displayed on the side of the sample abundancy heatmap. This will group samples sharing similarities.
Groups can pre-defined as an input to ‘cytoHeatmaps’.
We can also represents the data obtained in a quantitative way by calling the function cytoBoxplots
. The output of this function represents the proportion of each sample for every cluster.
cytoBoxplots(cfData, group = "group")
According to the Spitzer et al., Cell, 2017 study, an effective therapy lead to an upregulation of MHC-II. This actually can be seen by drawing the median intensity plots for this marker, as shown below together with CD45 and CD4.
msiPlot(cfData, markers = c("MHC.II", "CD45", "CD4"), byGroup='group')
## Picking joint bandwidth of 0.288
## Picking joint bandwidth of 0.151
## Picking joint bandwidth of 0.336
On this MSI plot, it indeed appears that the MHC-II signal is higher in the Effective therapy day 3 and day 8 compared to the Ineffective therapy day 3 and day 8.
cfData@samples$effect <- gsub("_D\\d", "", spitzer$group) # difference between effective and ineffective
cfData <- cytottest(cfData, group="effect", adjustMethod = "bonferroni")
It is also possible to use other clustering algorithms
library(FlowSOM)
fSOM <- FlowSOM(input = dirFCS,
transform = FALSE,
scale = FALSE,
colsToUse = c(9:11, 15:52),
nClus = 10, # We simply choose 10 clusters here
seed = 123)
From the flowSOM object we can extract the mapping and relevel each individual cell to its meta cluster.
clusterID_FS <- as.factor(fSOM$FlowSOM$map$mapping[,1])
levels(clusterID_FS) <- fSOM$metaclustering
From here it is of course possible to construct a new cfList and go through the whole workflow, but as simplification we just swap clusterID
with clusterID_FS
and only show the heatmap.
cfData@expr$clusterID <- clusterID_FS
cfData <- cellCounts(cfData) # Update cellCounts with new clusters
cytoHeatmaps(cfData, group='group', legend=TRUE)
Beyrend, Guillaume, Koen Stam, Thomas Höllt, Ferry Ossendrop, and Ramon Arens. 2018. “Cytofast: An R-based roadmap for quantitative analysis of flow and mass cytometry data to discover immune cell subsets and correlations between groups.” Manuscript Submitted for Publication.
Spitzer, Matthew H, Yaron Carmi, Nathan E Reticker-Flynn, Serena S Kwek, Deepthi Madhireddy, Maria M Martins, Pier Federico Gherardini, et al. 2017. “Systemic Immunity is Required for Effective Cancer Immunotherapy HHS Public Access.” Cell 26 (168(3)):487–502. https://doi.org/10.1016/j.cell.2016.12.022.