Cqf diagram

Introduction

Theory and practice of Bloom-filter-like data structures, including counting quotient filters

Summary

Approximate Membership Query (AMQ) data structures, such as the Bloom filter, quotient filter, and cuckoo filter, have found numerous applications in databases, storage systems, networks, computational biology, and other domains. However, many applications must work around limitations in the capabilities or performance of current AMQs, making these applications more complex and less performant. For example, many current AMQs cannot delete or count the number of occurrences of each input item, take up large amounts of space, are slow, cannot be resized or merged, or have poor locality of reference and hence perform poorly when stored on SSD or disk.

This project is investigating alternatives to the Bloom filter, such as quotient filters, counting quotient filters, and cuckoo filters, that overcome these limitations. The project includes both theoretical analyses and practical applications of these data structures.

Several open-source tools based on this work are available on github:

  • Counting quotient filter (CQF), a Bloom-filter replacement that is faster than a Bloom filter and supports counting and deletion in less space than a Bloom filter.
  • Squeakr, a fast and compact CQF-based k-mer counter for computational biology applications.
  • deBGR, a compact nearly exact representation of de Bruijn graphs of k-mers.

Researchers

External Researchers

  • Martin Farach-Colton
  • Michael A. Bender
  • Rob Patro