VMware Research | Overlook: Differentially Private Exploratory Visualization for Big Data

Abstract

Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a set of queries is known ahead of time and the system carefully optimizes a synopsis for it. The synopses that these systems build are costly to compute and may also be costly to store. We introduce Overlook, a system that enables private data exploration at interactive latencies for both data analysts and data curators. Overlook enables fast computation of private synopses using the idea of a "virtual synopsis," which represents a potentially large private synopsis implicitly using a pseudorandom function that is evaluated on demand. Overlook uses a synopsis that is simple but fast and has accuracy comparable to other state-of-the-art synopses, requiring 0.5 seconds or less to generate a histogram over 60 GB of data -- only 2.5 × slower than the equivalent public histogram. Together, these allow Overlook to provide a rich, interactive, and fast visual query interface that allows users to explore private data with minimal storage overhead -- tens of bytes -- on the server

Files

tpdp2020.pdf

Date

November, 2020

Authors

Pratiksha Thaker
Mihai Budiu
Parikshit Gopalan
Udi Wieder
Matei Zaharia

Related projects

Hillview

Research Areas

big data

Type

Inproceedings

Booktitle

Theory and Practice of Differential Privacy (TPDP 2020)