ensembleTax overview

ensembleTax is an R package that allows incorporation of information from multiple taxonomic assignment algorithms and/or reference databases to compute ensemble taxonomic assignments for ASVs/OTUs generated by common marker gene sequence analyses.

Please note that this is a simple vignette to demonstrate the functionality of the package. For more detailed discussion and example uses, see here: https://github.com/dcat4/ensembleTax/blob/master/README.md

The problem

Taxonomic assignment of marker gene sequences is a critical step of marker gene workflows as it imparts ecological significance and understanding to genetic data.

Many taxonomic assignment algorithms have been proposed to assign taxonomy to marker gene sequences (or OTUs/ASVs). Similarly, analysts are often forced to choose from one of several reference databases containing representative marker gene sequences with known taxonomic identities. The “best” assignment algorithm and/or reference database for a particular scientific question is often not obvious. To complicate things further, different reference databases generally do not share consistent taxonomic naming or ranking conventions.

ensembleTax solves this problem by providing flexible algorithms that synthesize information from multiple taxonomic assignment algorithm/reference database combinations and compute a single ensemble taxonomic assignment for each ASV/OTU in a marker gene data set.

ensembleTax algorithms

The core algorithms employed by ensembleTax are taxmapper and ensembleTax. taxmapper maps, or 'translates', one taxonomic nomenclature onto another by exact name matching. taxmapper is rank-agnostic, meaning it does not consider the hierarchical structure of a taxonomy and assumes that a taxonomic name means the same thing regardless of which reference database employs it.

ensembleTax computes ensemble taxonomic assignments based on assignments determined by any number of individual taxonomic assignment algorithm/reference database combinations. Several parameters allow the user to control trade-offs in the accuracy vs. resolution of taxonomic assignments.

Additional functions are included for pre-processing taxonomic assignments generated by specific taxonomic assignment algorithms and reference databases. These functions are designed to conveniently plug in downstream of the dada2 pipeline, but other pipelines may be used if the data is formatted properly for use with taxmapper and/or ensembleTax.

The taxonomic assignment algorithms explicitly supported by ensembleTax are:

  1. bayesian classifier as implemented in dada2's assignTaxonomy.
  2. idtaxa algorithm as implemented in DECIPHER.

Supported reference databases include:

  1. Silva SSU NR reference database (silva).
  2. Protistan Ribosomal Reference database (pr2).
  3. RDP train set v16
  4. GreenGenes v13.8 clustered at 97% similarity

Note that other databases may still be used with ensembleTax, but they must be mapped onto the taxonomic nomenclatures employed by Silva and/or pr2 using taxmapper, or they must be re-formatted appropriately for use with taxmapper. Follow the link above for vignettes demonstrating how to incorporate custom reference databases into your ensembleTax workflow.

ensembleTax 'pipeline' demonstration

Here we step through a simple example of an ensembleTax workflow to compute ensemble taxonomic assignments for a small set of 18S-V9 protist ASVs.

First, load some data included with the ensembleTax package. These are outputs of dada2's assignTaxonomy implemented against pr2, and of DECIPHER's idtaxa implemented against both pr2 and silva. The rubric.sample is an example of a “rubric”, which ensembleTax uses to track ASV-identifying information. The rubric is a DNAStringSet (see the Biostrings package) object produced by extracting ASV sequences from the seqtab used by dada2, and giving them arbitrary names (like sv1, sv2, etc).

library("ensembleTax")
library("Biostrings")
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## Loading required package: XVector
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
data("idtax.pr2.sample")
data("idtax.silva.sample")
data("bayes.sample")
data("rubric.sample")

head(idtax.pr2.sample)
## [[1]]
## [[1]]$taxon
## [1] "Root"                       "Eukaryota"                 
## [3] "Stramenopiles"              "Ochrophyta"                
## [5] "Bacillariophyta"            "Bacillariophyta_X"         
## [7] "Polar-centric-Mediophyceae" "Chaetoceros"               
## [9] "Chaetoceros_debilis_2"     
## 
## [[1]]$confidence
## [1] 82.20536 82.20536 81.01873 81.01873 81.01873 81.01873 81.01873 81.01873
## [9] 30.03599
## 
## 
## [[2]]
## [[2]]$taxon
## [1] "Root"                       "Eukaryota"                 
## [3] "Stramenopiles"              "Ochrophyta"                
## [5] "Bacillariophyta"            "Bacillariophyta_X"         
## [7] "Polar-centric-Mediophyceae" "Thalassiosira"             
## [9] "Thalassiosira_hispida"     
## 
## [[2]]$confidence
## [1] 71.33810 71.33810 66.29660 64.95401 64.95401 64.95401 59.40574 19.26610
## [9] 16.38823
## 
## 
## [[3]]
## [[3]]$taxon
## [1] "Root"           "Eukaryota"      "Stramenopiles"  "Ochrophyta"    
## [5] "MOCH-2"         "MOCH-2_X"       "MOCH-2_XX"      "MOCH-2_XXX"    
## [9] "MOCH-2_XXX_sp."
## 
## [[3]]$confidence
## [1] 67.78686 67.78686 61.88273 58.94742 49.33213 49.33213 49.33213 49.33213
## [9] 49.33213
## 
## 
## [[4]]
## [[4]]$taxon
## [1] "Root"             "Eukaryota"        "Archaeplastida"   "Streptophyta"    
## [5] "Embryophyceae"    "Embryophyceae_X"  "Embryophyceae_XX" "Taxus"           
## [9] "Taxus_baccata"   
## 
## [[4]]$confidence
## [1] 45.398229 45.398229 18.583773 18.157256 12.643442 12.643442 12.643442
## [8]  8.117989  8.117989
## 
## 
## [[5]]
## [[5]]$taxon
## [1] "Root"           "Eukaryota"      "Stramenopiles"  "Opalozoa"      
## [5] "MAST-3"         "MAST-3B"        "MAST-3B_X"      "MAST-3B_XX"    
## [9] "MAST-3B_XX_sp."
## 
## [[5]]$confidence
## [1] 92.47240 92.47240 90.81644 47.09622 46.51236 45.64512 45.64512 45.64512
## [9] 45.64512
head(idtax.silva.sample)
## [[1]]
## [[1]]$taxon
## [1] "Root"              "Eukaryota"         "SAR"              
## [4] "Stramenopiles"     "Ochrophyta"        "Diatomea"         
## [7] "Bacillariophytina" "Mediophyceae"      "Chaetoceros"      
## 
## [[1]]$confidence
## [1] 60.26929 55.47228 50.50453 48.89802 47.20018 46.61874 46.15582 45.57871
## [9] 43.57134
## 
## [[1]]$rank
## [1] "rootrank"    "domain"      "major_clade" "kingdom"     "superphylum"
## [6] "phylum"      "subphylum"   "class"       "genus"      
## 
## 
## [[2]]
## [[2]]$taxon
## [1] "Root"              "Eukaryota"         "SAR"              
## [4] "Stramenopiles"     "Ochrophyta"        "Diatomea"         
## [7] "Bacillariophytina" "Mediophyceae"      "Thalassiosira"    
## 
## [[2]]$confidence
## [1] 64.40876 62.34590 55.16122 48.52020 48.52020 48.52020 45.61851 38.14735
## [9] 27.39857
## 
## [[2]]$rank
## [1] "rootrank"    "domain"      "major_clade" "kingdom"     "superphylum"
## [6] "phylum"      "subphylum"   "class"       "genus"      
## 
## 
## [[3]]
## [[3]]$taxon
## [1] "Root"          "Eukaryota"     "SAR"           "Stramenopiles"
## [5] "Ochrophyta"    "MOCH-2"       
## 
## [[3]]$confidence
## [1] 70.91750 67.31802 64.52419 63.72148 62.63950 61.55201
## 
## [[3]]$rank
## [1] "rootrank"    "domain"      "major_clade" "kingdom"     "superphylum"
## [6] "class"      
## 
## 
## [[4]]
## [[4]]$taxon
##  [1] "Root"               "Eukaryota"          "Archaeplastida"    
##  [4] "Chloroplastida"     "Charophyta"         "Phragmoplastophyta"
##  [7] "Streptophyta"       "Embryophyta"        "Tracheophyta"      
## [10] "Spermatophyta"      "Pinophyta"         
## 
## [[4]]$confidence
##  [1] 55.95076 50.88138 37.40822 37.40822 37.40822 37.40822 37.40822 31.93052
##  [9] 31.93052 31.15123 29.84479
## 
## [[4]]$rank
##  [1] "rootrank"    "domain"      "major_clade" "kingdom"     "subkingdom" 
##  [6] "phylum"      "subphylum"   "class"       "subclass"    "infraclass" 
## [11] "superorder" 
## 
## 
## [[5]]
## [[5]]$taxon
## [1] "Root"          "Eukaryota"     "SAR"           "Stramenopiles"
## [5] "MAST-3"        "MAST-3B"      
## 
## [[5]]$confidence
## [1] 92.92780 92.36003 91.05291 91.05291 91.05291 88.60307
## 
## [[5]]$rank
## [1] "rootrank"    "domain"      "major_clade" "kingdom"     "phylum"     
## [6] "class"
head(bayes.sample)
## $tax
##                                                                                                                                    Kingdom    
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Eukaryota"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "Eukaryota"
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Eukaryota"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "Eukaryota"
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Eukaryota"
##                                                                                                                                    Supergroup      
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Archaeplastida"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "Stramenopiles" 
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Stramenopiles" 
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "Stramenopiles" 
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Stramenopiles" 
##                                                                                                                                    Division      
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Streptophyta"
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "Opalozoa"    
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Ochrophyta"  
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "Ochrophyta"  
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Ochrophyta"  
##                                                                                                                                    Class            
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Embryophyceae"  
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "MAST-3"         
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Bacillariophyta"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "MOCH-2"         
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Bacillariophyta"
##                                                                                                                                    Order              
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Embryophyceae_X"  
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "MAST-3B"          
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Bacillariophyta_X"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "MOCH-2_X"         
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Bacillariophyta_X"
##                                                                                                                                    Family                      
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Embryophyceae_XX"          
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "MAST-3B_X"                 
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Polar-centric-Mediophyceae"
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "MOCH-2_XX"                 
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Polar-centric-Mediophyceae"
##                                                                                                                                    Genus        
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Pinus"      
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "MAST-3B_XX" 
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Minidiscus" 
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "MOCH-2_XXX" 
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Chaetoceros"
##                                                                                                                                    Species                
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC "Pinus_wallichiana"    
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       "MAST-3B_XX_sp."       
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC     "Minidiscus_sp."       
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC    "MOCH-2_XXX_sp."       
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT      "Chaetoceros_debilis_2"
## 
## $boot
##                                                                                                                                    Kingdom
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC     100
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC           100
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC         100
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC        100
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT          100
##                                                                                                                                    Supergroup
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC         79
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC               97
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC             93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC            66
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT              99
##                                                                                                                                    Division
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC       74
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC             95
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC           93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC          65
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT            99
##                                                                                                                                    Class
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC    59
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC          93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC        93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT         98
##                                                                                                                                    Order
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC    59
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC          93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC        93
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT         98
##                                                                                                                                    Family
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC     59
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC           93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC         86
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC        58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT          98
##                                                                                                                                    Genus
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC    37
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC          93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC        59
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC       58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT         97
##                                                                                                                                    Species
## ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC      22
## GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC            93
## ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC          59
## GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC         58
## GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT           48
head(rubric.sample)
## DNAStringSet object of length 5:
##     width seq                                               names               
## [1]   125 GCACCCACCGATTGAAAAGCCCG...GGTGAAGTCGTAACAAGGTCTCT sv20747
## [2]   126 ACACCTACCAATTGAATGGTCCG...GGTGAAGTTGTAACAAGGTTTCC sv14136
## [3]   127 GCACCTACCGATTGAACCATACG...GGTGAAGTCGTAACAAGGTTTCC sv17278
## [4]   130 ACTCCTACCAATTGAATGATCCA...GGAGAAGTCATAACAAGGTTACC sv3579
## [5]   124 GCACCTACCGATTGAATGGTCCG...GGTGAAGTCGTAACAAGGTTTCC sv4298

ensembleTax pre-processing

We see from the above that the data structures returned by our two taxonomic assignment algorithms are different. It is critically important that the order of sequences in the rubric and in the idtaxa-returned Taxon object are the same. idtaxa does not return sequence names, but if you did not alter the order of your sequences as you provided them to idtaxa and/or DNAStringSet when creating your rubric, the ordering should be preserved and you should be good to go.

Here we'll run these tables through ensembleTax's pre-processing functions. Supplying a rubric allows ensembleTax to give each taxonomy table the same ASV- identifying information and to better track and organize your data.

idtax.pr2.pretty <- idtax2df(idtax.pr2.sample, 
                             db = "pr2", 
                             ranks = NULL,
                             boot = 50,
                             rubric = rubric.sample,
                             return.conf = FALSE)
idtax.silva.pretty <- idtax2df(idtax.silva.sample, 
                             db = "silva", 
                             ranks = NULL,
                             boot = 50,
                             rubric = rubric.sample,
                             return.conf = FALSE)
bayes.pr2.pretty <- bayestax2df(bayes.sample, 
                             db = "pr2", 
                             ranks = NULL,
                             boot = 50,
                             rubric = rubric.sample,
                             return.conf = FALSE)

head(idtax.pr2.pretty)
##       svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4  sv3579
## 5  sv4298
##                                                                                                                                  ASV
## 1     ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2    GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3      GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5       GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
##     kingdom    supergroup   division           class             order
## 1 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota Stramenopiles Ochrophyta            <NA>              <NA>
## 3 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 4      <NA>          <NA>       <NA>            <NA>              <NA>
## 5 Eukaryota Stramenopiles       <NA>            <NA>              <NA>
##                       family       genus species
## 1 Polar-centric-Mediophyceae        <NA>    <NA>
## 2                       <NA>        <NA>    <NA>
## 3 Polar-centric-Mediophyceae Chaetoceros    <NA>
## 4                       <NA>        <NA>    <NA>
## 5                       <NA>        <NA>    <NA>
head(idtax.silva.pretty)
##       svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4  sv3579
## 5  sv4298
##                                                                                                                                  ASV
## 1     ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2    GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3      GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5       GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
##      domain       phylum   class order family genus
## 1 Eukaryota         <NA>    <NA>  <NA>   <NA>  <NA>
## 2 Eukaryota Eukaryota_ph  MOCH-2  <NA>   <NA>  <NA>
## 3 Eukaryota         <NA>    <NA>  <NA>   <NA>  <NA>
## 4 Eukaryota         <NA>    <NA>  <NA>   <NA>  <NA>
## 5 Eukaryota       MAST-3 MAST-3B  <NA>   <NA>  <NA>
head(bayes.pr2.pretty)
##       svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4  sv3579
## 5  sv4298
##                                                                                                                                  ASV
## 1     ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2    GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3      GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5       GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
##     kingdom     supergroup     division           class             order
## 1 Eukaryota  Stramenopiles   Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota  Stramenopiles   Ochrophyta          MOCH-2          MOCH-2_X
## 3 Eukaryota  Stramenopiles   Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota Archaeplastida Streptophyta   Embryophyceae   Embryophyceae_X
## 5 Eukaryota  Stramenopiles     Opalozoa          MAST-3           MAST-3B
##                       family       genus        species
## 1 Polar-centric-Mediophyceae  Minidiscus Minidiscus_sp.
## 2                  MOCH-2_XX  MOCH-2_XXX MOCH-2_XXX_sp.
## 3 Polar-centric-Mediophyceae Chaetoceros           <NA>
## 4           Embryophyceae_XX        <NA>           <NA>
## 5                  MAST-3B_X  MAST-3B_XX MAST-3B_XX_sp.

We see that each taxonomy table is now a dataframe sorted by the column “svN”.

The taxmapper algorithm

After pre-processing our taxonomic assignment data sets above, we see we still can't make apples-to-apples comparisons across between the “idtax-silva” table and the two others because they employ different ranking and (though this may not be as obvious) naming conventions. taxmapper was created to solve this problem.

Here we'll use taxmapper to 'translate' the idtax-silva taxonomic assignments onto the same taxonomic nomenclature as the other two tables.

idtax.silva.mapped2pr2 <- taxmapper(idtax.silva.pretty,
                      tt.ranks = colnames(idtax.silva.pretty)[3:ncol(idtax.silva.pretty)],
                      tax2map2 = "pr2",
                      exceptions = c("Archaea", "Bacteria"),
                      ignore.format = TRUE,
                      synonym.file = "default",
                      streamline = TRUE,
                      outfilez = NULL)
head(idtax.silva.mapped2pr2)
##       svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4  sv3579
## 5  sv4298
##                                                                                                                                  ASV
## 1     ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2    GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3      GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5       GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
##     kingdom    supergroup   division  class   order family genus species
## 1 Eukaryota          <NA>       <NA>   <NA>    <NA>   <NA>  <NA>    <NA>
## 2 Eukaryota Stramenopiles Ochrophyta MOCH-2    <NA>   <NA>  <NA>    <NA>
## 3 Eukaryota          <NA>       <NA>   <NA>    <NA>   <NA>  <NA>    <NA>
## 4 Eukaryota          <NA>       <NA>   <NA>    <NA>   <NA>  <NA>    <NA>
## 5 Eukaryota Stramenopiles   Opalozoa MAST-3 MAST-3B   <NA>  <NA>    <NA>

Inspection of the mapped taxonomy table shows that it now mirrors the naming and ranking conventions of the other two taxonomy tables.

The ensembleTax algorithm

Now we have three different taxonomy tables with independent taxonomic assignments for each ASV in our example data set. From these we can compute ensemble taxonomic assignments with the ensembleTax algorithm.

We'll do a few runs to show a range of outcomes that might be achieved by implementing the algorithm under different parameter spaces.

Here's a run with the default parameters:

xx <- list(idtax.pr2.pretty, idtax.silva.mapped2pr2, bayes.pr2.pretty)
names(xx) <- c("idtax-pr2", "idtax-silva", "bayes-pr2")
eTax1 <- ensembleTax(xx, 
                     tablenames = names(xx), 
                     ranknames = c("kingdom", "supergroup", "division","class","order","family","genus","species"),
                     tiebreakz = "none", 
                     count.na=TRUE, 
                     assign.threshold = 0, 
                     weights=rep(1,length(xx)))
head(eTax1)
##       svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4  sv3579
## 5  sv4298
##                                                                                                                                  ASV
## 1     ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2    GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3      GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5       GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
##     kingdom    supergroup   division           class             order
## 1 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota Stramenopiles Ochrophyta          MOCH-2              <NA>
## 3 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota          <NA>       <NA>            <NA>              <NA>
## 5 Eukaryota Stramenopiles   Opalozoa          MAST-3           MAST-3B
##                       family       genus species
## 1 Polar-centric-Mediophyceae        <NA>    <NA>
## 2                       <NA>        <NA>    <NA>
## 3 Polar-centric-Mediophyceae Chaetoceros    <NA>
## 4                       <NA>        <NA>    <NA>
## 5                       <NA>        <NA>    <NA>

Just as an example of what is possible, we might tell ensembleTax not to count non-assignments (NA's) by setting count.na = FALSE. Further, we might specify that the assignments in the idtax-pr2 table should be weighted double the other two (weights = c(2,1,1)) if we suspect that these are the most robust assignments for our ASV data set. Here's what that looks like:

eTax2 <- ensembleTax(xx, 
                     tablenames = names(xx), 
                     ranknames = c("kingdom", "supergroup", "division","class","order","family","genus","species"),
                     tiebreakz = "none", 
                     count.na=FALSE, 
                     assign.threshold = 0, 
                     weights=c(2,1,1))
head(eTax2)
##       svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4  sv3579
## 5  sv4298
##                                                                                                                                  ASV
## 1     ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2    GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3      GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5       GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
##     kingdom     supergroup     division           class             order
## 1 Eukaryota  Stramenopiles   Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota  Stramenopiles   Ochrophyta          MOCH-2          MOCH-2_X
## 3 Eukaryota  Stramenopiles   Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota Archaeplastida Streptophyta   Embryophyceae   Embryophyceae_X
## 5 Eukaryota  Stramenopiles     Opalozoa          MAST-3           MAST-3B
##                       family       genus        species
## 1 Polar-centric-Mediophyceae  Minidiscus Minidiscus_sp.
## 2                  MOCH-2_XX  MOCH-2_XXX MOCH-2_XXX_sp.
## 3 Polar-centric-Mediophyceae Chaetoceros           <NA>
## 4           Embryophyceae_XX        <NA>           <NA>
## 5                  MAST-3B_X  MAST-3B_XX MAST-3B_XX_sp.

One more making use of the tiebreakz argument:

eTax3 <- ensembleTax(xx, 
                     tablenames = names(xx), 
                     ranknames = c("kingdom", "supergroup", "division","class","order","family","genus","species"),
                     tiebreakz = c("bayes-pr2"), 
                     count.na=TRUE, 
                     assign.threshold = 0, 
                     weights=c(1,1,2))
head(eTax3)
##       svN
## 1 sv14136
## 2 sv17278
## 3 sv20747
## 4  sv3579
## 5  sv4298
##                                                                                                                                  ASV
## 1     ACACCTACCAATTGAATGGTCCGGTGAGGACTCGGATTGTGGTTTAGCTCCTTCATTGGGGCCTGACTGCAAGAACTTGTCCGAACCTTATCATTTAGAGGAAGGTGAAGTTGTAACAAGGTTTCC
## 2    GCACCTACCGATTGAACCATACGGTGAGGTCCTCGGATTTCATGAATCGACCTTCACTGGGAGATTCGTGAGAGAAGTTGCCCAAACCTCGTGGTTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
## 3      GCACCCACCGATTGAAAAGCCCGGTGAAGAATCGGGATTGTAGCGTTGTCCTTCATTGGACATTGCCGTGAGAACCTTTCTGAACCTTGTTTTTTAGAGGAAGGTGAAGTCGTAACAAGGTCTCT
## 4 ACTCCTACCAATTGAATGATCCATGAAGTGTTTGGATTACATTGAAGATGGTGGTTTGCCGCTGTCGACGTCATGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCATAACAAGGTTACC
## 5       GCACCTACCGATTGAATGGTCCGGTGAGATCTTCGGACTGCAGCGAAAGTCAGCAATGAGTTAGTCGCGGAAAGTTGATCAAACCTTACCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCC
##     kingdom     supergroup     division           class             order
## 1 Eukaryota  Stramenopiles   Ochrophyta Bacillariophyta Bacillariophyta_X
## 2 Eukaryota  Stramenopiles   Ochrophyta          MOCH-2          MOCH-2_X
## 3 Eukaryota  Stramenopiles   Ochrophyta Bacillariophyta Bacillariophyta_X
## 4 Eukaryota Archaeplastida Streptophyta   Embryophyceae   Embryophyceae_X
## 5 Eukaryota  Stramenopiles     Opalozoa          MAST-3           MAST-3B
##                       family       genus        species
## 1 Polar-centric-Mediophyceae  Minidiscus Minidiscus_sp.
## 2                  MOCH-2_XX  MOCH-2_XXX MOCH-2_XXX_sp.
## 3 Polar-centric-Mediophyceae Chaetoceros           <NA>
## 4           Embryophyceae_XX        <NA>           <NA>
## 5                  MAST-3B_X  MAST-3B_XX MAST-3B_XX_sp.