Cardinal 2 provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.
Before statistical analysis, it is important to identify the statistical goal of the experiment:
Unsupervised analysis. The data has no class labels or conditions, and we are interested in exploratory analysis to discover regions of interest in the data.
Supervised analysis. The data has class labels and we want to train a statistical or machine learning model to predict the class labels of new data.
Class comparison. The data has class labels or conditions, and we want to test whether the abundance of the mass features is different between conditions.
CardinalWorkflows provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.
Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.
set.seed(2020)
mse <- simulateImage(preset=2, npeaks=10, dim=c(20,20), sdnoise=0.5,
peakheight=c(2,4), representation="centroid")
design <- makeFactor(circle=mse$circle, square=mse$square,
bg=!(mse$circle | mse$square))
image(mse, design ~ x * y, key=TRUE)
image(mse, feature=c(1,4,7), layout=c(1,3))
Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of “principal components” that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.
Use PCA()
to perform PCA on a MSImagingExperiment
.
pca <- PCA(mse, ncomp=3)
summary(pca)
## Principal components analysis:
##
## Component Standard deviation
## 1 1 4.7061494
## 2 2 2.6134145
## 3 3 0.6136734
We can see that the first two principal components explain most of the variation in the data.
image(pca, values="scores", superpose=FALSE, layout=c(1,3))
The loadings of the components show how each mass feature contributes to each component.
plot(pca, values="loadings", superpose=FALSE, layout=c(1,3), lwd=2)
Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.
pca_scores <- DataFrame(resultData(pca, 1, "scores"))
plot(pca_scores, PC1 ~ PC2, groups=design, pch=20)
Finding other mass features colocalized with a particular image is a common task in analysis of MS imaging experiments.
Use colocalize()
to find mass features that are colocalized with another image.
coloc <- colocalized(mse, mz=1023)
coloc
## Colocalized features:
## mz circle square correlation M1 M2
## 1 1023.7081 2.011661 4.063644 1.0000000 1.000 1.000
## 2 1135.9335 2.434873 3.985370 0.9430259 0.875 0.875
## 3 1200.4653 2.219637 4.166854 0.9292093 0.865 0.865
## 4 1361.2682 0.000000 4.259568 0.6712111 0.710 0.710
## 5 1227.9380 0.000000 4.039750 0.6671688 0.675 0.675
## 6 1453.5096 0.000000 4.187344 0.6657311 0.695 0.695
## 7 1858.8985 0.000000 3.970513 0.6620943 0.705 0.705
## 8 781.2367 1.392247 0.000000 0.3891237 0.650 0.650
## 9 473.9206 2.340799 0.000000 0.3632409 0.600 0.600
## 10 788.8633 1.542205 0.000000 0.3378016 0.605 0.605
By default, Pearson correlation is used to rank the colocalized features. Manders’ colocalization coefficients (M1 and M2) are also provided.
image(mse, mz=coloc$mz[1:3], layout=c(1,3))
Segmentation (clustering) a dataset is a useful way to summarize an MS imaging experiment and discover regions of interest within the sample.
Spatially-aware nearest shrunken centroids clustering allows simultaneous image segmentation and feature selection.
A smoothing radius r
, initial number of clusters k
, and sparsity parameters s
must be provided.
The larger the sparsity parameter s
, the fewer mass features will contribute to the segmentation.
Spatial shrunken centroids may result in fewer clusters than the initial number of clusters k
, so it is recommended to use a value for k
that is larger than the expected number of clusters, and allow the method to automatically choose the number of clusters.
ssc <- spatialShrunkenCentroids(mse, r=1, k=5, s=c(0,3,6,9))
summary(ssc)
## Spatially-aware nearest shrunken centroids:
##
## Segmentation / clustering
## Method = gaussian
## Distance = chebyshev
##
## Radius (r) Init (k) Shrinkage (s) Classes Features/Class
## 1 1 5 0 4 10.00
## 2 1 5 3 3 10.00
## 3 1 5 6 3 8.67
## 4 1 5 9 3 7.33
Plotting the predicted cluster probabilities shows a clear segmentation into the ground truth image.
image(ssc, model=list(s=9), values="probability")
Spatial shrunken centroids calculates t-statistics for each segment and each mass feature. These t-statistics a measure of the difference between the cluster center and the global mean.
plot(ssc, model=list(s=9), values="statistic", lwd=2)
Mass features with t-statistics of zero do not contribute to the segmentation. The sign of the t-statistic indicates whether the mass feature is over- or under-expressed in the given cluster relative to the global mean.
Use topFeatures()
to rank mass features by t-statistic.
ssc_top <- topFeatures(ssc, model=list(s=9), class == 1)
ssc_top
## Top-ranked features:
## mz circle square r k s class centers statistic
## 1 473.9206 2.340799 0.000000 1 5 9 1 2.3475172 11.9165555
## 2 1135.9335 2.434873 3.985370 1 5 9 1 4.3294593 4.0936001
## 3 788.8633 1.542205 0.000000 1 5 9 1 0.7122964 0.9371928
## 4 781.2367 1.392247 0.000000 1 5 9 1 0.6067728 0.3683360
## 5 1023.7081 2.011661 4.063644 1 5 9 1 2.8075020 0.0000000
## 6 1200.4653 2.219637 4.166854 1 5 9 1 2.4490113 0.0000000
## 7 1858.8985 0.000000 3.970513 1 5 9 1 1.2245057 0.0000000
## 8 1361.2682 0.000000 4.259568 1 5 9 1 1.3563276 -0.3583295
## 9 1453.5096 0.000000 4.187344 1 5 9 1 1.3431417 -1.3316029
## 10 1227.9380 0.000000 4.039750 1 5 9 1 1.2913549 -2.2072382
Spatially-aware Dirichlet Gaussian mixture models (spatial-DGMM) is a method of image segmentation applied to each mass feature individually, rather than the dataset as a whole.
This is useful for summarizing molecular ion images, and for discovering structures that clustering using all mass features together may miss.
dgmm <- spatialDGMM(mse, r=1, k=5, method="adaptive")
summary(dgmm)
## Spatially-aware Dirichlet Gaussian mixture models:
##
## Segmentation on 1 group: run0
## Method = adaptive
## Distance = chebyshev
##
## Radius (r) Init (k) Feature Classes/Group
## 1 1 5 1 2
## 2 1 5 2 4
## 3 1 5 3 1
## 4 1 5 4 3
## 5 1 5 5 3
## 6 1 5 6 4
## 7 1 5 7 2
## 8 1 5 8 2
## 9 1 5 9 2
## 10 1 5 10 2
A different segmentation is fit for each mass feature.
image(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))
Each image is modeled as a mixture of Gaussian distributions.
plot(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))
Spatial-DGMM segmentations can be especially useful for finding mass features colocalized with a region-of-interest.
When applied to a SpatialDGMM
object, colocalize()
is able to use match scores that can have a higher specificity than using Pearson correlation on the raw ion images.
coloc2 <- colocalized(dgmm, mse$square)
select(coloc2, -r, -k, -group)
## mz circle square feature class Mscore M1 M2
## 1 1227.9380 0.000000 4.039750 7 2 0.9808917 0.9871795 0.9935484
## 2 1453.5096 0.000000 4.187344 9 2 0.9559748 0.9743590 0.9806452
## 3 1361.2682 0.000000 4.259568 8 2 0.9430380 0.9551282 0.9867550
## 4 1858.8985 0.000000 3.970513 10 2 0.9113924 0.9230769 0.9863014
## 5 1023.7081 2.011661 4.063644 4 3 0.5185185 0.6282051 0.7480916
## 6 473.9206 2.340799 0.000000 1 1 0.4851485 0.9423077 0.5000000
## 7 1135.9335 2.434873 3.985370 5 3 0.3940887 0.5128205 0.6299213
## 8 788.8633 1.542205 0.000000 3 1 0.3900000 1.0000000 0.3900000
## 9 781.2367 1.392247 0.000000 2 1 0.3821429 0.6858974 0.4632035
## 10 1135.9335 2.434873 3.985370 5 2 0.3671498 0.4871795 0.5984252
Classification of pixels into different known classes (e.g., cancer vs normal) based on the mass spectra is a common application for MS imaging.
set.seed(2020)
mse2 <- simulateImage(preset=7, npeaks=10, dim=c(10,10), sdnoise=0.5,
nruns=3, peakdiff=2, representation="centroid")
class <- makeFactor(A=mse2$circleA, B=mse2$circleB)
image(mse2, class ~ x * y, key=TRUE, layout=c(1,3))
image(mse2, feature=1, layout=c(1,3))
When performing classification, it is important to use cross-validation so that reported accuracies are not overly optimistic.
We strongly recomend making sure that all spectra from the same experiment run belong to the same fold, to reduce predictive bias due to run effects.
Projection to latent structures (PLS), also called partial least squares, is a supervised dimension reduction technique. It can be thought of as being similar to PCA, but for classification or regression.
cv_pls <- crossValidate(mse2, .y=class, .fun=PLS, ncomp=1:5, .fold=run(mse2))
summary(cv_pls)
## Cross validation:
##
## Classification on 2 classes: A B
## Summarized 3 folds: run0 run1 run2
##
## ncomp Accuracy Sensitivity Specificity
## 1 1 0.6608485 0.0000000 1.0000000
## 2 2 0.8100534 0.4811594 0.9690476
## 3 3 0.9200094 0.8405797 0.9776557
## 4 4 0.9132067 0.8550725 0.9553114
## 5 5 0.9088528 0.7884058 0.9648352
We can see that using 3 PLS components produces the best cross-validated accuracy.
pls <- PLS(mse2, y=class, ncomp=3)
summary(pls)
## Projection to latent components:
##
## Classification on 2 classes: A B
## Method = pls
##
## Number of Components Accuracy Sensitivity Specificity
## 1 3 0.9 0.7647059 0.9775281
We can plot the fitted values to visualize the prediction.
image(pls, values="fitted", layout=c(1,3))
The PLS regression coefficients can be used to select influential features.
plot(pls, values="coefficients", lwd=2)
Like PCA, it can be useful to plot the PLS scores against each other to visualize the separation between classes.
pls_scores <- DataFrame(resultData(pls, 1, "scores"))
plot(pls_scores, C1 ~ C2, groups=class, pch=20)
Note that orthgonal PLS (O-PLS) is also available via method="opls"
or by using the separate OPLS()
method. Typically, both methods perform similarly, although O-PLS can sometimes produce more easily interpretable regression coefficients.
Spatially-aware nearest shrunken centroids classification is an extension of nearest shrunken centroids that incorporates spatial information into the model.
Like in the clustering case of spatial shrunken centroids, a smoothing radius r
must be provided along with sparsity parameters s
.
cv_ssc <- crossValidate(mse2, .y=class,
.fun=spatialShrunkenCentroids,
r=1, s=c(0,3,6,9), .fold=run(mse2))
summary(cv_ssc)
## Cross validation:
##
## Classification on 2 classes: A B
## Summarized 3 folds: run0 run1 run2
##
## r s Accuracy Sensitivity Specificity
## 1 1 0 0.9661925 1.0000000 0.9505495
## 2 1 3 0.9429979 0.8840580 0.9871795
## 3 1 6 0.8593713 0.6666667 1.0000000
## 4 1 9 0.7955665 0.5217391 1.0000000
We can see that in this case, the fully dense model (s=0
) that uses all mass features has the best cross-validated accuracy for the data.
ssc2 <- spatialShrunkenCentroids(mse2, y=class, r=1, s=0)
summary(ssc2)
## Spatially-aware nearest shrunken centroids:
##
## Classification on 2 classes: A B
## Method = gaussian
## Distance = chebyshev
##
## Radius (r) Shrinkage (s) Features/Class Accuracy Sensitivity Specificity
## 1 1 0 10 0.9714286 1 0.9550562
Plotting the predicted class probabilities produces a more easily interpretable visualization than PLS in this case.
image(ssc2, layout=c(1,3))
Plotting t-statistics shows the first three features have much higher abundance in condition “B”.
plot(ssc2, values="statistic", lwd=2)
topFeatures(ssc2, class=="B") %>% select(-diff, -r, -k, -s)
## mz ref circleA circleB class centers statistic
## 1 788.8633 0.0000000 0.8374991 2.8374991 B 4.3684041 24.463624
## 2 781.2367 0.0000000 1.0220864 3.0220864 B 4.1350547 21.296975
## 3 473.9206 0.0000000 0.9705134 2.9705134 B 2.9378242 16.611108
## 4 1200.4653 0.0000000 1.4870747 1.4870747 B 1.4592569 6.055987
## 5 1135.9335 0.0000000 1.2190690 1.2190690 B 1.4859448 5.847812
## 6 1023.7081 0.0000000 0.8512596 0.8512596 B 1.0217795 2.975441
## 7 1453.5096 0.9428803 0.0000000 0.0000000 B 1.1996578 -2.614496
## 8 1227.9380 1.0776237 0.0000000 0.0000000 B 0.5602308 -3.047349
## 9 1858.8985 1.0152029 0.0000000 0.0000000 B 1.6275417 -3.328770
## 10 1361.2682 1.0581255 0.0000000 0.0000000 B 1.8852220 -3.383986
Statistical hypothesis testing is used to determine whether the abundance of a feature is different between two or more conditions.
In order to account for additional factors like the effect of experimental runs, subject-to-subject variability, etc., this is often done most appropriately using linear models.
This example uses a simple experiment with two conditions “A” and “B”, with three replicates in each condition.
set.seed(2020)
mse3 <- simulateImage(preset=4, npeaks=10, dim=c(10,10), sdnoise=0.3,
nruns=3, peakdiff=1, representation="centroid")
trt <- makeFactor(A=mse3$circleA, B=mse3$circleB)
image(mse3, trt ~ x * y, key=TRUE, layout=c(2,3))
image(mse3, feature=1, layout=c(2,3))
We know from the design of the simulation that the first 5 (of 10) mass features differ between the conditions.
featureData(mse3)
## MassDataFrame with 10 rows and 3 columns
## :mz: circleA circleB diff
## <numeric> <numeric> <numeric> <logical>
## 1 473.920555889073 0.970513361233341 1.97051336123334 TRUE
## 2 781.236740634594 1.02208639892803 2.02208639892803 TRUE
## 3 788.863300202366 0.837499067219899 1.8374990672199 TRUE
## 4 1023.7080688987 0.85125956644625 1.85125956644625 TRUE
## 5 1135.93347410709 1.21906901436148 2.21906901436148 TRUE
## 6 1200.46530187298 1.48707474151019 1.48707474151019 FALSE
## 7 1227.93802289832 1.07762369335342 1.07762369335342 FALSE
## 8 1361.26820721663 1.05812553405903 1.05812553405903 FALSE
## 9 1453.50957409616 0.942880342583213 0.942880342583213 FALSE
## 10 1858.89853179664 1.01520294356747 1.01520294356747 FALSE
Use meansTest()
to fit linear models with the most basic summarization. The groups
indicating the observational units must be provided. Each group is summarized by its mean, and then a linear model is fit to the summaries. The number of groups is the effective sample size.
Here, we specify condition
as the sole fixed effect. Internally, the model will call either lm()
or lme()
depending on whether any random effects are provided.
mtest <- meansTest(mse3, ~ condition, groups=run(mse3))
summary(mtest)
## Means-summarized linear model testing:
##
## Fixed effects: ~condition
##
## Summarized 6 groups: runA1 runA2 ... runB2 runB3
##
## Likelihood ratio test for fixed effects:
##
## Feature LR PValue FDR
## 1 1 6.8325 0.008951300 0.0447565
## 2 2 2.2660 0.132239496 0.3305987
## 3 3 1.9045 0.167580606 0.3351612
## 4 4 7.0631 0.007868697 0.0447565
## 5 5 4.1586 0.041422884 0.1380763
## 6 6 0.0025 0.960241024 0.9654641
## 7 7 0.0019 0.965464107 0.9654641
## 8 8 0.0317 0.858660385 0.9654641
## 9 9 0.0684 0.793731073 0.9654641
## 10 10 0.1113 0.738668115 0.9654641
By default, the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).
Box-and-whisker plots can be used to visualize the differences (if any) between the conditions.
plot(mtest, layout=c(2,5), ylab="intensity")
Use topFeatures()
to extract the significant results.
topFeatures(mtest, p.adjust="fdr", AdjP < .05)
## Top-ranked tests: ~condition vs ~1
## mz circleA circleB diff feature LR PValue AdjP
## 1 473.9206 0.9705134 1.970513 TRUE 1 6.8325 0.008951300 0.0447565
## 2 1023.7081 0.8512596 1.851260 TRUE 4 7.0631 0.007868697 0.0447565
Testing of SpatialDGMM
objects is implemented by segmentationTest()
. The key idea here is that spatial-DGMM segmentation captures within-sample heterogeneity, so testing between spatial-DGMM segments is more sensitive that simply summarizing a whole sample by its mean.
First, we must segment the data with spatialDGMM()
, while making sure that each observational unit is segmented within a different group (as specified by groups
).
The number of groups is the effective sample size.
dgmm2 <- spatialDGMM(mse3, r=1, k=5, groups=run(mse3))
Now use segmentationTest()
to fit the models.
In order to fit the models, a representative spatial-DGMM segment must be selected for each group. There are two automated ways to do this via classControl
: "Ymax"
means use the segment with the highest mean, and "Mscore"
means use the segment with the highest match score with the fixed effects.
(A list of character vectors giving the explicit mapping between group and representative spatial-DGMM segment can also be given to classControl
.)
stest <- segmentationTest(dgmm2, ~ condition, classControl="Ymax")
summary(stest)
## Segmentation-based linear model testing:
##
## Fixed effects: ~condition
##
## Summarized 6 groups: runA1 runA2 ... runB2 runB3
##
## Likelihood ratio test for fixed effects:
##
## Radius (r) Init (k) Feature LR PValue FDR
## 1 1 5 1 7.5289 6.071628e-03 1.517907e-02
## 2 1 5 2 1.9810 1.592807e-01 2.722838e-01
## 3 1 5 3 16.1447 5.868291e-05 2.934145e-04
## 4 1 5 4 14.8106 1.188625e-04 3.962082e-04
## 5 1 5 5 38.0854 6.771587e-10 6.771587e-09
## 6 1 5 6 0.0725 7.877654e-01 9.234024e-01
## 7 1 5 7 0.0302 8.620901e-01 9.234024e-01
## 8 1 5 8 1.9427 1.633703e-01 2.722838e-01
## 9 1 5 9 0.0092 9.234024e-01 9.234024e-01
## 10 1 5 10 1.1957 2.741756e-01 3.916794e-01
As with meansTest()
, the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).
Box-and-whisker plots can be used to visually compare the conditions.
plot(stest, layout=c(2,5), ylab="intensity")
If an automated method for classControl
was used, it can be helpful to plot the mapping to see what segments were used to represent each group.
image(stest, model=list(feature=3), values="mapping")
In this case, segmentationTest()
finds two more significant mass features compared to meansTest()
.
topFeatures(stest, p.adjust="fdr", AdjP < .05) %>% select(-diff, -r, -k)
## mz circleA circleB feature LR PValue AdjP
## 1 1135.9335 1.2190690 2.219069 5 38.0854 6.771587e-10 6.771587e-09
## 2 788.8633 0.8374991 1.837499 3 16.1447 5.868291e-05 2.934145e-04
## 3 1023.7081 0.8512596 1.851260 4 14.8106 1.188625e-04 3.962082e-04
## 4 473.9206 0.9705134 1.970513 1 7.5289 6.071628e-03 1.517907e-02
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Cardinal_2.4.0 ProtGenerics_1.18.0 S4Vectors_0.24.0
## [4] EBImage_4.28.0 BiocParallel_1.20.0 BiocGenerics_0.32.0
## [7] BiocStyle_2.14.0
##
## loaded via a namespace (and not attached):
## [1] biglm_0.9-1 locfit_1.5-9.1 tidyselect_0.2.5
## [4] xfun_0.10 purrr_0.3.3 lattice_0.20-38
## [7] htmltools_0.4.0 viridisLite_0.3.0 yaml_2.2.0
## [10] rlang_0.4.1 pillar_1.4.2 glue_1.3.1
## [13] DBI_1.0.0 sp_1.3-1 jpeg_0.1-8.1
## [16] stringr_1.4.0 htmlwidgets_1.5.1 evaluate_0.14
## [19] Biobase_2.46.0 knitr_1.25 irlba_2.3.3
## [22] Rcpp_1.0.2 BiocManager_1.30.9 abind_1.4-5
## [25] png_0.1-7 digest_0.6.22 stringi_1.4.3
## [28] tiff_0.1-5 bookdown_0.14 dplyr_0.8.3
## [31] grid_3.6.1 tools_3.6.1 bitops_1.0-6
## [34] magrittr_1.5 RCurl_1.95-4.12 tibble_2.1.3
## [37] crayon_1.3.4 pkgconfig_2.0.3 MASS_7.3-51.4
## [40] Matrix_1.2-17 matter_1.12.0 assertthat_0.2.1
## [43] rmarkdown_1.16 R6_2.4.0 fftwtools_0.9-8
## [46] mclust_5.4.5 signal_0.7-6 nlme_3.1-141
## [49] compiler_3.6.1