Unlock the Value of Sensitive Data with Differential Privacy

The Snowflake AI Data Cloud has democratized data for thousands of customers, removing data silos and powering data sharing and collaboration use cases. Many customers have been able to unlock enormous value from their data with Snowflake, including safely collaborating on sensitive data using Snowflake Data Clean Rooms and Data Governance features. However, some highly sensitive data has remained off-limits due to regulatory requirements and privacy concerns — until now.

To address these challenges and truly democratize even highly sensitive data, we are excited to announce the general availability of differential privacy policies in Snowflake. These are built on a technology called differential privacy, which is regarded in academic literature as the gold standard for private data analytics. Differential privacy brings mathematical rigor to privacy protection, enabling customers to leverage previously inaccessible data. With the data unlocked, Snowflake customers can use it to power collaboration use cases like cross-organization, cross-geography data sharing, and even create new revenue streams with data monetization.

What is differential privacy?

Differential privacy is a privacy-enhancing technology that helps minimize the risk of sensitive information leakage by protecting the identity of individual entities in a data set, like people, organizations, location, etc. It has been implemented in many high-profile use cases with sensitive data, including the 2020 U.S. Census Data Release and Apple’s user data collection, and is highlighted in the 2023 AI Executive Order

With differential privacy, data consumers can run analytical queries on the full dataset, but they cannot see the row-level data nor can they reverse engineer sensitive information. It complements data privacy methods that protect data at rest, in motion and in use. 

To do this, Snowflake differential privacy policies dynamically add noise to query results. The amount of noise added depends on how sensitive the query is, as determined by the mathematical techniques of differential privacy. For example, if the query is calculating broad aggregates, the amount of noise will be relatively small, potentially negligible. If the query concerns a small group or even one individual, the noise will be large enough to obscure their identities and protect your most sensitive data against privacy attacks.

Typically, implementations of differentially private systems require significant investment and expertise; open source differential privacy libraries are not only not differentially private end-to-end, but they also may not implement features that make differential privacy useful for real-world use cases. Snowflake differential privacy policies, however, are ready to use out of the box, without these downsides.

Differential privacy unlocks rather than devalues data

Differential privacy is a significant improvement compared to existing approaches, which do much more to devalue data. To illustrate this, let’s examine a use case from the healthcare industry that uses a dataset of patient visits to healthcare providers. In this example, the data provider needs to protect the patient identities to comply with privacy regulations.

For the basic version of this use case, let’s assume we have a dataset where each row represents one visit between a patient and a healthcare provider. Without differential privacy, the data provider would typically mask the fields that could be used to identify the patient, like the date of the visit. Fields like these would be masked to a coarser level of granularity, such as removing the month and day and leaving just the year. While this approach may appear to make sense from a privacy point of view, it drastically reduces the value of the data. For example, data consumers can no longer ask questions like, “On average, how long is drug X taken for condition Y?”

With differential privacy, the data provider does not need to mask or remove any fields, enabling data consumers to ask these kinds of detailed questions and receive useful answers.