Tools for indexing and searching terabytes of genomic and transcriptomic data.


Scientists need tools for quickly indexing, searching, and analyzing terabytes, or even petabytes, of raw genomic and trancriptomic data. This project is developing multiple tools for efficiently representing biological data so that huge data sets can be processed in RAM, providing orders-of-magnitude increases in scalability and performance. This project has produced several open-source tools.
  • Squeakr, a fast and compact CQF-based k-mer counter for computational biology applications.
  • deBGR, a compact nearly exact representation of de Bruijn graphs of k-mers.
  • Mantis, an indexing system for searching for sequences in large-scale databases of genomic and transcriptomic data.


External Researchers

  • Fatemeh Almodaresi
  • Michael A. Bender
  • Mike Ferdman
  • Prashant Pandey
  • Rob Patro