Results. This paper introduces Mantis, a space-efficient data structure that can be used to index thousands of raw-read experiments and facilitate large-scale sequence searches on those experiments. Mantis uses counting quotient filters instead of Bloom filters, enabling rapid index builds and queries, small indexes, and exact results, i.e., no false positives or negatives. Furthermore, Mantis is also a colored de Bruijn graph representation, so it supports fast graph traversal and other topological analyses in addition to large-scale sequence-level searches.
In our performance evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6×–108× faster than SSBT and has no false positives or false negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2652 human blood, breast, and brain RNA-seq experiments in one hour and 22 minutes; SBT took close to 4 days and AllSomeSBT took about eight hours.
Mantis is written in C++11 and is available at https://github.com/splatlab/mantis.