A statical method for sampling unstructured logs.
Log management systems like Log Insight are an essential part of
an IT manager’s tool-set. They serve as a first-line defense in diag-
nosing both performance and behavioral errors. However, the sheer
volume of log entries that they manage often precludes interac-
Our works introduces methods for sampling log entries
in which a small fraction of ingested messages are kept in a sepa-
rate sublog. By restricting queries to run only over this sublog we
are able to quickly produce an approximation for correct result. The
main challenge with this approach is in picking a sublog that is both
small enough to provide performance improvements but also rich
enough that the resulting relative error is sufficiently small. We develop effective sampling strategies for doing so and
provide a formal statistical analysis of our approach.
- Mayank Agrawal
- Nicholas Kushmerick
- Ramses Morales