Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a set of queries is known ahead of time and the system carefully optimizes a synopsis for it. The synopses that these systems build are costly to compute and may also be costly to store. We introduce Overlook, a system that enables private data exploration at interactive latencies for both data analysts and data curators. Overlook enables fast computation of private synopses using the idea of a "virtual synopsis," which represents a potentially large private synopsis implicitly using a pseudorandom function that is evaluated on demand. Overlook uses a synopsis that is simple but fast and has accuracy comparable to other state-of-the-art synopses, requiring 0.5 seconds or less to generate a histogram over 60 GB of data -- only 2.5 × slower than the equivalent public histogram. Together, these allow Overlook to provide a rich, interactive, and fast visual query interface that allows users to explore private data with minimal storage overhead -- tens of bytes -- on the server



November, 2020


Related projects

Research Areas

  • big data




Theory and Practice of Differential Privacy (TPDP 2020)