A statical method for sampling unstructured logs.


Log management systems like Log Insight are an essential part of an IT manager’s tool-set. They serve as a first-line defense in diag- nosing both performance and behavioral errors. However, the sheer volume of log entries that they manage often precludes interac- tive performance. Our works introduces methods for sampling log entries in which a small fraction of ingested messages are kept in a sepa- rate sublog. By restricting queries to run only over this sublog we are able to quickly produce an approximation for correct result. The main challenge with this approach is in picking a sublog that is both small enough to provide performance improvements but also rich enough that the resulting relative error is sufficiently small. We develop effective sampling strategies for doing so and provide a formal statistical analysis of our approach.


External Researchers

  • Mayank Agrawal
  • Nicholas Kushmerick
  • Ramses Morales