ensembleTax is an R package that allows incorporation of information from multiple taxonomic assignment algorithms and/or reference databases to compute ensemble taxonomic assignments for ASVs/OTUs generated by common marker gene sequence analyses.
Please note that this is a simple vignette to demonstrate the functionality of the package. For more detailed discussion and example uses, see here: https://github.com/dcat4/ensembleTax/blob/master/README.md
Taxonomic assignment of marker gene sequences is a critical step of marker gene workflows as it imparts ecological significance and understanding to genetic data.
Many taxonomic assignment algorithms have been proposed to assign taxonomy to marker gene sequences (or OTUs/ASVs). Similarly, analysts are often forced to choose from one of several reference databases containing representative marker gene sequences with known taxonomic identities. The “best” assignment algorithm and/or reference database for a particular scientific question is often not obvious. To complicate things further, different reference databases generally do not share consistent taxonomic naming or ranking conventions.
ensembleTax solves this problem by providing flexible algorithms that synthesize information from multiple taxonomic assignment algorithm/reference database combinations and compute a single ensemble taxonomic assignment for each ASV/OTU in a marker gene data set.
The core algorithms employed by ensembleTax are taxmapper and ensembleTax. taxmapper maps, or 'translates', one taxonomic nomenclature onto another by exact name matching. taxmapper is rank-agnostic, meaning it does not consider the hierarchical structure of a taxonomy and assumes that a taxonomic name means the same thing regardless of which reference database employs it.
ensembleTax computes ensemble taxonomic assignments based on assignments determined by any number of individual taxonomic assignment algorithm/reference database combinations. Several parameters allow the user to control trade-offs in the accuracy vs. resolution of taxonomic assignments.
Additional functions are included for pre-processing taxonomic assignments generated by specific taxonomic assignment algorithms and reference databases. These functions are designed to conveniently plug in downstream of the dada2 pipeline, but other pipelines may be used if the data is formatted properly for use with taxmapper and/or ensembleTax.
The taxonomic assignment algorithms explicitly supported by ensembleTax are:
Supported reference databases include:
Note that other databases may still be used with ensembleTax, but they must be mapped onto the taxonomic nomenclatures employed by Silva and/or pr2 using taxmapper, or they must be re-formatted appropriately for use with taxmapper. Follow the link above for vignettes demonstrating how to incorporate custom reference databases into your ensembleTax workflow.
Here we step through a simple example of an ensembleTax workflow to compute ensemble taxonomic assignments for a small set of 18S-V9 protist ASVs.
First, load some data included with the ensembleTax package. These are outputs of dada2's assignTaxonomy implemented against pr2, and of DECIPHER's idtaxa implemented against both pr2 and silva. The rubric.sample is an example of a “rubric”, which ensembleTax uses to track ASV-identifying information. The rubric is a DNAStringSet (see the Biostrings package) object produced by extracting ASV sequences from the seqtab used by dada2, and giving them arbitrary names (like sv1, sv2, etc).
library("ensembleTax")
library("Biostrings")
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
## Loading required package: XVector
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
data("idtax.pr2.sample")
data("idtax.silva.sample")
data("bayes.sample")
data("rubric.sample")
head(idtax.pr2.sample)
## [[1]]
## [[1]]$taxon
## [1] "Root" "Eukaryota"
## [3] "Stramenopiles" "Ochrophyta"
## [5] "Bacillariophyta" "Bacillariophyta_X"
## [7] "Polar-centric-Mediophyceae" "Chaetoceros"
## [9] "Chaetoceros_debilis_2"
##
## [[1]]$confidence
## [1] 82.20536 82.20536 81.01873 81.01873 81.01873 81.01873 81.01873 81.01873
## [9] 30.03599
##
##
## [[2]]
## [[2]]$taxon
## [1] "Root" "Eukaryota"
## [3] "Stramenopiles" "Ochrophyta"
## [5] "Bacillariophyta" "Bacillariophyta_X"
## [7] "Polar-centric-Mediophyceae" "Thalassiosira"
## [9] "Thalassiosira_hispida"
##
## [[2]]$confidence
## [1] 71.33810 71.33810 66.29660 64.95401 64.95401 64.95401 59.40574 19.26610
## [9] 16.38823
##
##
## [[3]]
## [[3]]$taxon
## [1] "Root" "Eukaryota" "Stramenopiles" "Ochrophyta"
## [5] "MOCH-2" "MOCH-2_X" "MOCH-2_XX" "MOCH-2_XXX"
## [9] "MOCH-2_XXX_sp."
##
## [[3]]$confidence
## [1] 67.78686 67.78686 61.88273 58.94742 49.33213 49.33213 49.33213 49.33213
## [9] 49.33213
##
##
## [[4]]
## [[4]]$taxon
## [1] "Root" "Eukaryota" "Archaeplastida" "Streptophyta"
## [5] "Embryophyceae" "Embryophyceae_X" "Embryophyceae_XX" "Taxus"
## [9] "Taxus_baccata"
##
## [[4]]$confidence
## [1] 45.398229 45.398229 18.583773 18.157256 12.643442 12.643442 12.643442
## [8] 8.117989 8.117989
##
##
## [[5]]
## [[5]]$taxon
## [1] "Root" "Eukaryota" "Stramenopiles" "Opalozoa"
## [5] "MAST-3" "MAST-3B" "MAST-3B_X" "MAST-3B_XX"
## [9] "MAST-3B_XX_sp."
##
## [[5]]$confidence
## [1] 92.47240 92.47240 90.81644 47.09622 46.51236 45.64512 45.64512 45.64512
## [9] 45.64512
head(idtax.silva.sample)
## [[1]]
## [[1]]$taxon
## [1] "Root" "Eukaryota" "SAR"
## [4] "Stramenopiles" "Ochrophyta" "Diatomea"
## [7] "Bacillariophytina" "Mediophyceae" "Chaetoceros"
##
## [[1]]$confidence
## [1] 60.26929 55.47228 50.50453 48.89802 47.20018 46.61874 46.15582 45.57871
## [9] 43.57134
##
## [[1]]$rank
## [1] "rootrank" "domain" "major_clade" "kingdom" "superphylum"
## [6] "phylum" "subphylum" "class" "genus"
##
##
## [[2]]
## [[2]]$taxon
## [1] "Root" "Eukaryota" "SAR"
## [4] "Stramenopiles" "Ochrophyta" "Diatomea"
## [7] "Bacillariophytina" "Mediophyceae" "Thalassiosira"
##
## [[2]]$confidence
## [1] 64.40876 62.34590 55.16122 48.52020 48.52020 48.52020 45.61851 38.14735
## [9] 27.39857
##
## [[2]]$rank
## [1] "rootrank" "domain" "major_clade" "kingdom" "superphylum"
## [6] "phylum" "subphylum" "class" "genus"
##
##
## [[3]]
## [[3]]$taxon
## [1] "Root" "Eukaryota" "SAR" "Stramenopiles"
## [5] "Ochrophyta" "MOCH-2"
##
## [[3]]$confidence
## [1] 70.91750 67.31802 64.52419 63.72148 62.63950 61.55201
##
## [[3]]$rank
## [1] "rootrank" "domain" "major_clade" "kingdom" "superphylum"
## [6] "class"
##
##
## [[4]]
## [[4]]$taxon
## [1] "Root" "Eukaryota" "Archaeplastida"
## [4] "Chloroplastida" "Charophyta" "Phragmoplastophyta"
## [7] "Streptophyta" "Embryophyta" "Tracheophyta"
## [10] "Spermatophyta" "Pinophyta"
##
## [[4]]$confidence
## [1] 55.95076 50.88138 37.40822 37.40822 37.40822 37.40822 37.40822 31.93052
## [9] 31.93052 31.15123 29.84479
##
## [[4]]$rank
## [1] "rootrank" "domain" "major_clade" "kingdom" "subkingdom"
## [6] "phylum" "subphylum" "class" "subclass" "infraclass"
## [11] "superorder"
##
##
## [[5]]
## [[5]]$taxon
## [1] "Root" "Eukaryota" "SAR" "Stramenopiles"
## [5] "MAST-3" "MAST-3B"
##
## [[5]]$confidence
## [1] 92.92780 92.36003 91.05291 91.05291 91.05291 88.60307
##
## [[5]]$rank
## [1] "rootrank" "domain" "major_clade" "kingdom" "phylum"
## [6] "class"
head(bayes.sample)
## $tax
## Kingdom
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Eukaryota"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "Eukaryota"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Eukaryota"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "Eukaryota"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Eukaryota"
## Supergroup
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Archaeplastida"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "Stramenopiles"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Stramenopiles"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "Stramenopiles"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Stramenopiles"
## Division
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Streptophyta"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "Opalozoa"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Ochrophyta"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "Ochrophyta"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Ochrophyta"
## Class
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Embryophyceae"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MAST-3"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Bacillariophyta"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MOCH-2"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Bacillariophyta"
## Order
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Embryophyceae_X"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MAST-3B"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Bacillariophyta_X"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MOCH-2_X"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Bacillariophyta_X"
## Family
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Embryophyceae_XX"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MAST-3B_X"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Polar-centric-Mediophyceae"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MOCH-2_XX"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Polar-centric-Mediophyceae"
## Genus
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Pinus"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MAST-3B_XX"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Minidiscus"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MOCH-2_XXX"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Chaetoceros"
## Species
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Pinus_wallichiana"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MAST-3B_XX_sp."
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC "Minidiscus_sp."
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC "MOCH-2_XXX_sp."
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT "Chaetoceros_debilis_2"
##
## $boot
## Kingdom
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 100
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 100
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 100
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 100
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 100
## Supergroup
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 79
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 97
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 66
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 99
## Division
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 74
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 95
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 65
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 99
## Class
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 59
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 98
## Order
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 59
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 98
## Family
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 59
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 86
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 98
## Genus
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 37
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 59
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 97
## Species
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC 22
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC 59
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC 58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT 48
head(rubric.sample)
## DNAStringSet object of length 5:
## width seq names
## [1] 125 GCACCCACCGATTGAAAAGCCCG...GGTGAAGTCGTAACAAGGTCTCT sv20747
## [2] 126 ACACCTACCAATTGAATGGTCCG...GGTGAAGTTGTAACAAGGTTTCC sv14136
## [3] 127 GCACCTACCGATTGAACCATACG...GGTGAAGTCGTAACAAGGTTTCC sv17278
## [4] 130 ACTCCTACCAATTGAATGATCCA...GGAGAAGTCATAACAAGGTTACC sv3579
## [5] 124 GCACCTACCGATTGAATGGTCCG...GGTGAAGTCGTAACAAGGTTTCC sv4298
We see from the above that the data structures returned by our two taxonomic assignment algorithms are different. It is critically important that the order of sequences in the rubric and in the idtaxa-returned Taxon object are the same. idtaxa does not return sequence names, but if you did not alter the order of your sequences as you provided them to idtaxa and/or DNAStringSet when creating your rubric, the ordering should be preserved and you should be good to go.
Here we'll run these tables through ensembleTax's pre-processing functions. Supplying a rubric allows ensembleTax to give each taxonomy table the same ASV- identifying information and to better track and organize your data.
idtax.pr2.pretty <- idtax2df(idtax.pr2.sample,
db = "pr2",
ranks = NULL,
boot = 50,
rubric = rubric.sample,
return.conf = FALSE)
idtax.silva.pretty <- idtax2df(idtax.silva.sample,
db = "silva",
ranks = NULL,
boot = 50,
rubric = rubric.sample,
return.conf = FALSE)
bayes.pr2.pretty <- bayestax2df(bayes.sample,
db = "pr2",
ranks = NULL,
boot = 50,
rubric = rubric.sample,
return.conf = FALSE)
head(idtax.pr2.pretty)
## svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4 sv3579
## 5 sv4298
## ASV
## 1 ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2 GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3 GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5 GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## kingdom supergroup division class order
## 1 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota Stramenopiles Ochrophyta <NA> <NA>
## 3 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 Eukaryota Stramenopiles <NA> <NA> <NA>
## family genus species
## 1 Polar-centric-Mediophyceae <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 Polar-centric-Mediophyceae Chaetoceros <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
head(idtax.silva.pretty)
## svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4 sv3579
## 5 sv4298
## ASV
## 1 ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2 GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3 GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5 GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## domain phylum class order family genus
## 1 Eukaryota <NA> <NA> <NA> <NA> <NA>
## 2 Eukaryota Eukaryota_ph MOCH-2 <NA> <NA> <NA>
## 3 Eukaryota <NA> <NA> <NA> <NA> <NA>
## 4 Eukaryota <NA> <NA> <NA> <NA> <NA>
## 5 Eukaryota MAST-3 MAST-3B <NA> <NA> <NA>
head(bayes.pr2.pretty)
## svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4 sv3579
## 5 sv4298
## ASV
## 1 ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2 GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3 GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5 GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## kingdom supergroup division class order
## 1 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota Stramenopiles Ochrophyta MOCH-2 MOCH-2_X
## 3 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota Archaeplastida Streptophyta Embryophyceae Embryophyceae_X
## 5 Eukaryota Stramenopiles Opalozoa MAST-3 MAST-3B
## family genus species
## 1 Polar-centric-Mediophyceae Minidiscus Minidiscus_sp.
## 2 MOCH-2_XX MOCH-2_XXX MOCH-2_XXX_sp.
## 3 Polar-centric-Mediophyceae Chaetoceros <NA>
## 4 Embryophyceae_XX <NA> <NA>
## 5 MAST-3B_X MAST-3B_XX MAST-3B_XX_sp.
We see that each taxonomy table is now a dataframe sorted by the column “svN”.
After pre-processing our taxonomic assignment data sets above, we see we still can't make apples-to-apples comparisons across between the “idtax-silva” table and the two others because they employ different ranking and (though this may not be as obvious) naming conventions. taxmapper was created to solve this problem.
Here we'll use taxmapper to 'translate' the idtax-silva taxonomic assignments onto the same taxonomic nomenclature as the other two tables.
idtax.silva.mapped2pr2 <- taxmapper(idtax.silva.pretty,
tt.ranks = colnames(idtax.silva.pretty)[3:ncol(idtax.silva.pretty)],
tax2map2 = "pr2",
exceptions = c("Archaea", "Bacteria"),
ignore.format = TRUE,
synonym.file = "default",
streamline = TRUE,
outfilez = NULL)
head(idtax.silva.mapped2pr2)
## svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4 sv3579
## 5 sv4298
## ASV
## 1 ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2 GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3 GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5 GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## kingdom supergroup division class order family genus species
## 1 Eukaryota <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 Eukaryota Stramenopiles Ochrophyta MOCH-2 <NA> <NA> <NA> <NA>
## 3 Eukaryota <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 Eukaryota <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 Eukaryota Stramenopiles Opalozoa MAST-3 MAST-3B <NA> <NA> <NA>
Inspection of the mapped taxonomy table shows that it now mirrors the naming and ranking conventions of the other two taxonomy tables.
Now we have three different taxonomy tables with independent taxonomic assignments for each ASV in our example data set. From these we can compute ensemble taxonomic assignments with the ensembleTax algorithm.
We'll do a few runs to show a range of outcomes that might be achieved by implementing the algorithm under different parameter spaces.
Here's a run with the default parameters:
xx <- list(idtax.pr2.pretty, idtax.silva.mapped2pr2, bayes.pr2.pretty)
names(xx) <- c("idtax-pr2", "idtax-silva", "bayes-pr2")
eTax1 <- ensembleTax(xx,
tablenames = names(xx),
ranknames = c("kingdom", "supergroup", "division","class","order","family","genus","species"),
tiebreakz = "none",
count.na=TRUE,
assign.threshold = 0,
weights=rep(1,length(xx)))
head(eTax1)
## svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4 sv3579
## 5 sv4298
## ASV
## 1 ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2 GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3 GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5 GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## kingdom supergroup division class order
## 1 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota Stramenopiles Ochrophyta MOCH-2 <NA>
## 3 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota <NA> <NA> <NA> <NA>
## 5 Eukaryota Stramenopiles Opalozoa MAST-3 MAST-3B
## family genus species
## 1 Polar-centric-Mediophyceae <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 Polar-centric-Mediophyceae Chaetoceros <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
Just as an example of what is possible, we might tell ensembleTax not to count non-assignments (NA's) by setting count.na = FALSE. Further, we might specify that the assignments in the idtax-pr2 table should be weighted double the other two (weights = c(2,1,1)) if we suspect that these are the most robust assignments for our ASV data set. Here's what that looks like:
eTax2 <- ensembleTax(xx,
tablenames = names(xx),
ranknames = c("kingdom", "supergroup", "division","class","order","family","genus","species"),
tiebreakz = "none",
count.na=FALSE,
assign.threshold = 0,
weights=c(2,1,1))
head(eTax2)
## svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4 sv3579
## 5 sv4298
## ASV
## 1 ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2 GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3 GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5 GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## kingdom supergroup division class order
## 1 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota Stramenopiles Ochrophyta MOCH-2 MOCH-2_X
## 3 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota Archaeplastida Streptophyta Embryophyceae Embryophyceae_X
## 5 Eukaryota Stramenopiles Opalozoa MAST-3 MAST-3B
## family genus species
## 1 Polar-centric-Mediophyceae Minidiscus Minidiscus_sp.
## 2 MOCH-2_XX MOCH-2_XXX MOCH-2_XXX_sp.
## 3 Polar-centric-Mediophyceae Chaetoceros <NA>
## 4 Embryophyceae_XX <NA> <NA>
## 5 MAST-3B_X MAST-3B_XX MAST-3B_XX_sp.
One more making use of the tiebreakz argument:
eTax3 <- ensembleTax(xx,
tablenames = names(xx),
ranknames = c("kingdom", "supergroup", "division","class","order","family","genus","species"),
tiebreakz = c("bayes-pr2"),
count.na=TRUE,
assign.threshold = 0,
weights=c(1,1,2))
head(eTax3)
## svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4 sv3579
## 5 sv4298
## ASV
## 1 ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2 GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3 GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5 GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## kingdom supergroup division class order
## 1 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota Stramenopiles Ochrophyta MOCH-2 MOCH-2_X
## 3 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota Archaeplastida Streptophyta Embryophyceae Embryophyceae_X
## 5 Eukaryota Stramenopiles Opalozoa MAST-3 MAST-3B
## family genus species
## 1 Polar-centric-Mediophyceae Minidiscus Minidiscus_sp.
## 2 MOCH-2_XX MOCH-2_XXX MOCH-2_XXX_sp.
## 3 Polar-centric-Mediophyceae Chaetoceros <NA>
## 4 Embryophyceae_XX <NA> <NA>
## 5 MAST-3B_X MAST-3B_XX MAST-3B_XX_sp.