The MgDb Class in the metagenomeFeatures package includes the sequences and taxonomic information for a 16S database. The following vignette demonstrates the class methods for exploring and subsetting a MgDb-class
object using the gg85
included in the metagenomeFeatures
package. MgDb-class
object with full databases are in separate packages such as the greengenes13.5MgDb
package.
MgDb-class
Object## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, basename, cbind, colMeans, colSums, colnames,
## dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
## intersect, is.unsorted, lapply, lengths, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
## rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Warning: replacing previous import 'lazyeval::is_formula' by
## 'purrr::is_formula' when loading 'metagenomeFeatures'
## Warning: replacing previous import 'lazyeval::is_atomic' by
## 'purrr::is_atomic' when loading 'metagenomeFeatures'
## MgDb object:[1] "Metadata"
## |ACCESSION_DATE: Mon Apr 2 13:30:09 2018
## |URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_8_otus
## |DB_TYPE_NAME: GreenGenes
## |DB_VERSION: 13.8 85% OTUS
## |DB_TYPE_VALUE: MgDb
## |DB_SCHEMA_VERSION: 2.0
## [1] "Sequence Data:"
## [1] "DECIPHER formatted seqDB"
## [1] "Taxonomy Data:"
## # Source: table<Seqs> [?? x 11]
## # Database: sqlite 3.22.0
## # [/tmp/Rtmp2N7cJP/Rinst4b4c48229f79/metagenomeFeatures/extdata/gg13.8_85.sqlite]
## row_names identifier description Keys Kingdom Phylum Class Ord
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 MgDb 1111561 11115… k__Bact… p__Pro… c__Gam… o__Le…
## 2 2 MgDb 1111421 11114… k__Bact… p__Pro… c__Alp… o__Rh…
## 3 3 MgDb 1111090 11110… k__Bact… p__Act… c__Nit… o__Ni…
## 4 4 MgDb 1110893 11108… k__Bact… p__Bac… c__[Sa… o__[S…
## 5 5 MgDb 1110814 11108… k__Bact… p__BRC1 c__ o__
## 6 6 MgDb 1110088 11100… k__Bact… p__Pro… c__Gam… o__
## 7 7 MgDb 1109993 11099… k__Bact… p__Chl… c__Deh… o__
## 8 8 MgDb 1109948 11099… k__Bact… p__Pla… c__[Br… o__Br…
## 9 9 MgDb 1109493 11094… k__Bact… p__Pla… c__vad… o__
## 10 10 MgDb 1109328 11093… k__Bact… p__Chl… c__Ana… o__S0…
## # ... with more rows, and 3 more variables: Family <chr>, Genus <chr>,
## # Species <chr>
## [1] "Tree Data:"
##
## Phylogenetic tree with 5088 tips and 5087 internal nodes.
##
## Tip labels:
## 4479984, 540377, 811993, 823988, 4397176, 4446470, ...
##
## Rooted; includes branch lengths.
taxa_keytypes
## [1] "row_names" "identifier" "description" "Keys" "Kingdom"
## [6] "Phylum" "Class" "Ord" "Family" "Genus"
## [11] "Species"
## [1] "Keys" "Kingdom" "Phylum" "Class" "Ord" "Family" "Genus"
## [8] "Species"
## # A tibble: 6 x 1
## Kingdom
## <chr>
## 1 k__Bacteria
## 2 k__Bacteria
## 3 k__Bacteria
## 4 k__Bacteria
## 5 k__Bacteria
## 6 k__Bacteria
Used to retrieve db entries for a specified taxonomic group or id list, can return either taxonomic, sequences information, or both.
mgDb_select(gg85, type = "taxa",
keys = c("Vibrionaceae", "Enterobacteriaceae"),
keytype = "Family")
## # A tibble: 27 x 8
## Keys Kingdom Phylum Class Ord Family Genus Species
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1047956 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__ s__
## 2 818108 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__ s__
## 3 651366 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__ s__
## 4 592303 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__P… s__
## 5 575794 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__ s__
## 6 559954 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__ s__
## 7 368586 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__ s__
## 8 289174 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__P… s__shi…
## 9 268585 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__C… s__
## 10 232927 k__Bacteria p__Proteobacteria c__Ga… o__V… f__Vi… g__ s__
## # ... with 17 more rows
## A DNAStringSet instance of length 27
## width seq names
## [1] 1366 ATTGAACGCTGGCGGCAGGC...GTGAATACGTTCCCGGGCCT 1047956
## [2] 1410 ACGGTACACAGAGAGCTTGC...TTCGGGAGGGCGCTTACCAC 818108
## [3] 1421 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 651366
## [4] 1453 AGTCGAGCGGTAACAGTGGG...CATGACTGGGGGAAGTCGTA 592303
## [5] 1419 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 575794
## ... ... ...
## [23] 1383 TGGGAAACTGCCTGATGGAG...AACCTTCGGGAGGGCGGTTT 4336809
## [24] 1443 GGGTGAGTAATGTCTGGGAA...GGTTGCAAAAGAAGTAGGTA 656881
## [25] 1563 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4371215
## [26] 1392 GCGGCGGACGGGTGAGTAAT...TGGGTAGTTTAACCTTCGGG 4375861
## [27] 1389 TCGTGCGGTAATAGAGGAAC...AGCAAGTAGTTTAACCTAAA 4443068
## $taxa
## # A tibble: 2 x 8
## Keys Kingdom Phylum Class Ord Family Genus Species
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 661785 k__Bacteria p__Proteobacteria c__Ga… o__Vi… f__Vi… g__V… s__
## 2 4375861 k__Bacteria p__Proteobacteria c__Ga… o__Vi… f__Vi… g__V… s__
##
## $seq
## A DNAStringSet instance of length 2
## width seq names
## [1] 1420 AGAGTTTGATCATGGCTCAGA...TTCATGACTGGGGTGAAGTC 661785
## [2] 1392 GCGGCGGACGGGTGAGTAATG...TGGGTAGTTTAACCTTCGGG 4375861
##
## $tree
##
## Phylogenetic tree with 2 tips and 1 internal nodes.
##
## Tip labels:
## [1] "661785" "4375861"
##
## Rooted; includes branch lengths.
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] metagenomeFeatures_2.0.0 Biobase_2.40.0
## [3] BiocGenerics_0.26.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.16 XVector_0.20.0 compiler_3.5.0
## [4] pillar_1.2.2 dbplyr_1.2.1 bindr_0.1.1
## [7] zlibbioc_1.26.0 tools_3.5.0 digest_0.6.15
## [10] bit_1.1-12 nlme_3.1-137 RSQLite_2.1.0
## [13] evaluate_0.10.1 memoise_1.1.0 tibble_1.4.2
## [16] lattice_0.20-35 pkgconfig_2.0.1 rlang_0.2.0
## [19] cli_1.0.0 DBI_0.8 yaml_2.1.18
## [22] bindrcpp_0.2.2 stringr_1.3.0 dplyr_0.7.4
## [25] knitr_1.20 Biostrings_2.48.0 S4Vectors_0.18.0
## [28] IRanges_2.14.0 tidyselect_0.2.4 stats4_3.5.0
## [31] rprojroot_1.3-2 bit64_0.9-7 grid_3.5.0
## [34] glue_1.2.0 R6_2.2.2 rmarkdown_1.9
## [37] DECIPHER_2.8.0 purrr_0.2.4 blob_1.1.1
## [40] magrittr_1.5 backports_1.1.2 htmltools_0.3.6
## [43] assertthat_0.2.0 ape_5.1 utf8_1.1.3
## [46] stringi_1.1.7 lazyeval_0.2.1 crayon_1.3.4