Securing sensitive & PII data on Snowflake

4 Data Security services available for securing data on Snowflake.

Richie Bachala
5 min readMar 15, 2021

Sensitive Data needs to be organized, protected, and used. Enterprises needing additional user level security for governing sensitive PII data within Snowflakes Data Cloud need to resort to plenty of scripting to manage the roles, users & policies within Snowflake or use one of the many vendor services available in the market

· Data Governance Policy — Gives mandate and authority for Roles/Responsibilities, Data Quality Standards and Dimensions, Processes and Scope.

· Processes — Data Quality reporting, issue management, MDM processes, and update Data Governance framework.

· Roles and Responsibilities — Data owners, Data Stewarts, Data Producers, Data Consumers, Data Custodians, and Data Governance leads.

Altr

Altr DSaaS (Data Security AS A Service) is a no-code cloud based platform analyzes organizational data usage, uses Policies to govern data consumption and leverages data to users in need. Altr can publish external queries for auditing of how Snowflake Roles accessing data is consumed, provides legitimate consumption to a role, and lock out queries on a group of columns to a group policy. Altr must first Tokenize sensitive data before cloud storage and uses external functions on Snowflake to de-tokenize the data.

https://www.altr.com/

Qlik

Qlik Replicate has embedded SecuPi technology masking PII limiting Frameworks using Snowflake UDFs and encryption based on key to obfuscate data. Qlik Data Catalog inventories Snowflake to search and profile metadata, lineage, and connection information which can obfuscate policy based sensitive data. Policy defines access, security and use. Collaboration consumes policies and discovers what data and who is using data. Catalog shopping cart has the lineage and correlation of a dataset with “Request Access” for sensitive data. Data loads identify quality of records.

Qlik Sense monitors, secures and controls information via Security Rules, Roles and Groups. Administrators see all apps, tasks, streams. users, security, etc, but can also be locked out too for PII or PHI. Administrators can be limited to teams, departments, objects or tasks based on Admin role based security.

https://www.qlik.com/us/products/data-integration-products?ga-link=bjp-hp-tab

Informatica Axon Data Marketplace

Enterprise data governance solution that can be used on-premises or in the cloud includes Cataloging, Governance, Data Quality, and Privacy build on AI-powered metadata drive platform. Users can search and request curated and reliable data.

Producers (data owner) can publish and revoke access to data set collections for other groups via delivery methods like EDC-Provisioned Cloud Data (XML and JSON to web locations) or others based on Policies. Tasks for business owners can do approvals/revokes for business requests and technical owners need to assure delivery of data.

Consumers (data usage) can search by categories and owners and selected data sets can look at quality of data and delivery method and policy for them. Business justifications, delivery requests, and usage guide/terms of use user agreement can be made by consumers for submitting orders.

Protegrity Cloud Protect

With the Data Protection Platform, customers can replace information fields with artificial identifiers, or pseudonyms. To pseudonymize data, the platform can either encrypt the data — using mathematical algorithms and cryptographic keys to change data into binary cyphertext — or apply Protegrity’s vaultless tokenization method (PVT), which converts cleartext data into a random string of characters.

The Enterprise Security Administration (ESA) creates Data Sensitive Policies that are used in conjunction with the Policy Agent to encrypt data using Protect Functions which travel through the API Gateway to reach Snowflake for Producers as indicated in the bellow diagram.

https://www.protegrity.com/?utm_term=protegrity&utm_campaign=Brand&utm_source=adwords&utm_medium=ppc&hsa_acc=4329975922&hsa_cam=12512354111&hsa_grp=118545256229&hsa_ad=504985479055&hsa_src=g&hsa_tgt=kwd-337188975085&hsa_kw=protegrity&hsa_mt=e&hsa_net=adwords&hsa_ver=3&gclid=Cj0KCQiAv6yCBhCLARIsABqJTjbqRk0XEKNoU5d0VjJoMaOJy44W58APvpelPMZgH2yHe0-S7qGA1LwaAhf8EALw_wcB

Early stage evaluation thoughts …

Am still continuing to evaluate the above solutions & some of my early stage thoughts — there are some similarities and differences that each vendor exposes in their documentation that needs further explanations. The similarities is that both Qlik and Informatica Axon Data Marketplace has a cataloging UI feature that allows for the user level Producers to identify sensitive data in Group Policies to protect its published data to a User Group via a delivery method embedded by the vendor package and are further tasked to do approvals for Consumer requests. Consumers on the other hand search and request published data by a delivery method but have to wait for their approval. The differences here is noted as ALTR uses External API Functions at the database level tokenizing the data before it lands in the database and not UI and then to interface at query time to de-tokenize the data and not rely on a delivery system. Thus, from security point of view, the technology integration directly at the data source is far more superior than at the delivery system which user interaction takes place, but that is further to be tested.

https://twitter.com/richiebachala

--

--

Richie Bachala

Distributed SQL, Data Engineering Leader @ Yugabyte | past @ Sherwin-Williams, Hitachi, Oracle