MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db
contains 16 of them. That is:
Abbreviation | Category |
---|---|
A | Anatomy |
B | Organisms |
C | Diseases |
D | Chemicals and Drugs |
E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
F | Psychiatry and Psychology |
G | Phenomena and Processes |
H | Disciplines and Occupations |
I | Anthropology, Education, Sociology and Social Phenomena |
J | Technology and Food and Beverages |
K | Humanities |
L | Information Science |
M | Persons |
N | Health Care |
V | Publication Type |
Z | Geographical Locations |
MeSH terms were associated with Entrez Gene ID by three methods, gendoo
, gene2pubmed
and RBBH
(Reciprocal Blast Best Hit).
Method | Way of corresponding Entrez Gene IDs and MeSH IDs |
---|---|
Gendoo | Text-mining |
gene2pubmed | Manual curation by NCBI teams |
RBBH | sequence homology with BLASTP search (E-value<10-50) |
meshes
supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from gendoo
, gene2pubmed
and RBBH
are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.
For algorithm details, please refer to the vignettes of DOSE1 package.
library(meshes)
data(geneList, package="DOSE")
de <- names(geneList)[1:100]
x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C')
head(x)
## ID Description GeneRatio BgRatio pvalue
## D043171 D043171 Chromosomal Instability 16/96 198/16528 2.794765e-14
## D000782 D000782 Aneuploidy 17/96 320/16528 3.866830e-12
## D042822 D042822 Genomic Instability 16/96 312/16528 3.007419e-11
## D012595 D012595 Scleroderma, Systemic 11/96 279/16528 6.449334e-07
## D009303 D009303 Nasopharyngeal Neoplasms 11/96 314/16528 2.049315e-06
## D019698 D019698 Hepatitis C, Chronic 11/96 317/16528 2.246856e-06
## p.adjust qvalue
## D043171 2.459394e-11 1.815127e-11
## D000782 1.701405e-09 1.255702e-09
## D042822 8.821761e-09 6.510798e-09
## D012595 1.418854e-04 1.047168e-04
## D009303 3.295389e-04 2.432123e-04
## D019698 3.295389e-04 2.432123e-04
## geneID
## D043171 4312/991/2305/1062/4605/10403/7153/55355/4751/4085/81620/332/7272/9212/1111/6790
## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790
## D042822 55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790
## D012595 4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321
## D009303 4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790
## D019698 4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620
## Count
## D043171 16
## D000782 17
## D042822 16
## D012595 11
## D009303 11
## D019698 11
In the over-representation analysis, we use data source from gendoo
and C
(Diseases) category.
In the following example, we use data source from gene2pubmed
and test category G
(Phenomena and Processes) using GSEA.
y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G")
head(y)
## ID Description setSize enrichmentScore NES
## D009929 D009929 Organ Size 489 -0.3508778 -1.574653
## D006339 D006339 Heart Rate 321 -0.3697848 -1.600728
## D001846 D001846 Bone Development 314 -0.3751600 -1.619238
## D055105 D055105 Waist Circumference 252 -0.4073259 -1.721104
## D020257 D020257 Ventricular Remodeling 255 -0.3972260 -1.680688
## D009767 D009767 Obesity, Morbid 231 -0.3750171 -1.575273
## pvalue p.adjust qvalues rank
## D009929 0.001225490 0.03100751 0.02288401 2309
## D006339 0.001300390 0.03100751 0.02288401 2405
## D001846 0.001305483 0.03100751 0.02288401 2100
## D055105 0.001345895 0.03100751 0.02288401 1625
## D020257 0.001349528 0.03100751 0.02288401 2202
## D009767 0.001353180 0.03100751 0.02288401 1625
## leading_edge
## D009929 tags=27%, list=18%, signal=23%
## D006339 tags=29%, list=19%, signal=24%
## D001846 tags=27%, list=17%, signal=23%
## D055105 tags=24%, list=13%, signal=21%
## D020257 tags=31%, list=18%, signal=26%
## D009767 tags=23%, list=13%, signal=20%
## core_enrichment
## D009929 154/9846/3315/6716/9732/5139/7337/5530/23001/4086/80114/6532/6416/1499/8945/7157/627/2252/22891/2908/8654/4088/27445/22846/4057/860/23286/268/2735/2104/23522/5480/51131/3082/10253/831/604/1028/182/7173/5624/8743/23047/596/9905/1548/2272/22829/948/27303/4314/196/6019/595/5021/7248/4212/2488/54820/5334/6403/2246/4803/866/5919/2308/79789/1907/7048/1831/4060/23387/2247/5468/8076/5793/3485/1733/3952/126/3778/79068/79633/6653/5244/4313/3625/10468/9201/1501/6720/2273/2099/3480/5764/6387/1471/1462/4016/2690/8817/6678/8821/5125/1191/5350/2162/5744/23541/185/367/4982/25802/4128/150/64084/3479/10451/9370/10699/125/4857/1308/2167/652/57502/4137/8614/5241
## D006339 4985/7139/8929/3784/3375/154/1760/9781/5139/118/2702/6532/6416/2869/270/7157/627/2908/7138/5563/3643/1129/7779/947/1901/2034/4179/4804/64388/1621/4881/8863/5021/844/4212/11030/5797/6403/4803/84059/79789/5176/3953/5243/5468/1012/2868/5793/4023/7056/3952/5577/126/2946/3778/477/5733/4313/2944/9201/3075/9499/2273/2099/1471/857/775/5138/4306/4487/213/5350/5744/23245/2152/2697/2791/185/6863/2952/5327/80206/9607/3572/150/3479/2006/55259/9370/125/652/55351
## D001846 8945/7157/57798/79048/627/6500/8038/860/2752/4882/3371/2915/63971/54455/3791/819/57045/596/2034/54808/80781/1280/64388/2261/4054/11059/3483/9900/26234/4734/9452/4208/4322/253461/1278/7048/51280/10903/7869/1277/3953/10516/10411/8835/79776/11167/2317/3485/3952/5274/54681/4488/10486/1009/2202/91851/2099/5764/23327/3339/8817/83716/6678/4915/633/658/54361/5744/165/5654/10631/3487/367/4982/3667/79971/1634/3479/114899/9370/652/8614/4969
## D055105 7466/181/2078/2169/948/4887/3931/4018/4212/9208/6403/3087/84059/5919/253461/79789/5176/3953/5950/2166/6414/5243/5468/1012/4023/582/4214/3485/7350/3952/5577/126/585/79068/2202/4313/9369/2621/2099/2690/1363/4306/5167/23245/4886/185/9863/80206/3667/9607/1489/3479/9370/23704/2167/5346/7021/79689/9
## D020257 1238/5595/5228/1499/7157/4776/25970/408/8654/4811/3910/3371/6548/3082/5914/1291/947/80781/5592/1490/1306/4314/80070/10272/4881/7042/3912/1511/2934/4060/283/1277/7078/5549/22795/1293/2247/50507/1281/5136/11167/4319/1513/6310/4313/2199/1294/2099/6387/3339/7079/1462/1292/775/3908/4306/4035/633/5167/5350/11188/10418/10631/9429/1805/2487/2697/185/7043/6863/3913/4982/1634/7060/3479/7373/9370/2167/652
## D009767 7474/4925/80781/2169/948/4887/4929/22982/4692/2246/63924/5919/56605/5176/23410/3953/5950/2166/5243/5468/5166/4023/90865/7350/3952/3778/79068/8864/901/9369/6720/2099/1471/3991/6678/2819/1363/4035/32/5167/8639/185/5327/3667/9607/150/563/3479/9370/2167/5346/79689
User can use visualization methods implemented in DOSE (i.e.barplot
, dotplot
, cnetplot
, enrichMap
, upsetplot
and gseaplot
) to visualize these enrichment results. With these visualization methods, it’s much easier to interpret enriched results.
dotplot(x)
gseaplot(y, y[1,1], title=y[1,2])
meshes
implemented four IC-based methods (i.e. Resnik2, Jiang3, Lin4 and Schlicker5) and one graph-structure based method (i.e. Wang6). For algorithm details, please refer to the vignette of GOSemSim package7
meshSim
function is designed to measure semantic similarity between two MeSH term vectors.
library(meshes)
## hsamd <- meshdata("MeSH.Hsa.eg.db", category='A', computeIC=T, database="gendoo")
data(hsamd)
meshSim("D000009", "D009130", semData=hsamd, measure="Resnik")
## [1] 0.2910261
meshSim("D000009", "D009130", semData=hsamd, measure="Rel")
## [1] 0.521396
meshSim("D000009", "D009130", semData=hsamd, measure="Jiang")
## [1] 0.4914785
meshSim("D000009", "D009130", semData=hsamd, measure="Wang")
## [1] 0.5557103
meshSim(c("D001369", "D002462"), c("D017629", "D002890", "D008928"), semData=hsamd, measure="Wang")
## D017629 D002890 D008928
## D001369 0.2886598 0.1923711 0.2193326
## D002462 0.6521739 0.2381925 0.2809552
geneSim
function is designed to measure semantic similarity among two gene vectors.
geneSim("241", "251", semData=hsamd, measure="Wang", combine="BMA")
## [1] 0.487
geneSim(c("241", "251"), c("835", "5261","241", "994"), semData=hsamd, measure="Wang", combine="BMA")
## 835 5261 241 994
## 241 0.732 0.337 1.000 0.438
## 251 0.526 0.588 0.487 0.597
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] meshes_1.2.0 DOSE_3.2.0 MeSH.db_1.8.0
## [4] MeSH.Hsa.eg.db_1.8.0 MeSHDbi_1.12.0 BiocGenerics_0.22.0
## [7] BiocStyle_2.4.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10 plyr_1.8.4 compiler_3.4.0
## [4] tools_3.4.0 digest_0.6.12 tibble_1.3.0
## [7] RSQLite_1.1-2 evaluate_0.10 memoise_1.1.0
## [10] gtable_0.2.0 igraph_1.0.1 fastmatch_1.1-0
## [13] DBI_0.6-1 rvcheck_0.0.8 yaml_2.1.14
## [16] gridExtra_2.2.1 fgsea_1.2.1 stringr_1.2.0
## [19] knitr_1.15.1 S4Vectors_0.14.0 IRanges_2.10.0
## [22] stats4_3.4.0 rprojroot_1.2 grid_3.4.0
## [25] qvalue_2.8.0 Biobase_2.36.0 data.table_1.10.4
## [28] AnnotationDbi_1.38.0 BiocParallel_1.10.0 GOSemSim_2.2.0
## [31] rmarkdown_1.4 reshape2_1.4.2 GO.db_3.4.1
## [34] DO.db_2.9 ggplot2_2.2.1 magrittr_1.5
## [37] splines_3.4.0 backports_1.0.5 scales_0.4.1
## [40] htmltools_0.3.5 colorspace_1.3-2 labeling_0.3
## [43] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3
1. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).
2. Philip, R. Semantic similarity in a taxonomy: An Information-Based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999).
3. Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of 10th International Conference on Research In Computational Linguistics (1997). at <http://www.citebase.org/abstract?id=oai:arXiv.org:cmp-lg/9709008>
4. Lin, D. An Information-Theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning 296—304 (1998). doi:10.1.1.55.1832
5. Schlicker, A., Domingues, F. S., Rahnenfuhrer, J. & Lengauer, T. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006).
6. Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of go terms. Bioinformatics (Oxford, England) 23, 1274–81 (2007).
7. Yu, G. et al. GOSemSim: An r package for measuring semantic similarity among go terms and gene products. Bioinformatics 26, 976–978 (2010).
8. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an r package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012).
9. Yu, G. & He, Q.-Y. ReactomePA: An r/bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).