Anyone that has used Snowflake as a data warehouse understands how groundbreaking and different it truly is. First, it not only has an architecture that separates compute from storage, but it puts the control of that compute in the hands of the consumer. While BigQuery claims to also have an architecture where the compute and storage tiers are disaggregated, it’s hard to tell and you can’t really control what’s running where.
Having the ability to create a warehouse whenever you want, for whatever and whoever you want, means that you get to choose whether your whole organization uses the same warehouse (and compete for resources) or each power user has their own warehouse. You can change your mind at any time! It’s also priced on demand, and you can choose what’s best for your business in the moment and not have to pre-provision for worst case scenarios.
Amazingly, you can have multiple warehouse running at the same time accessing the same data. For example, you can have an ETL task running on one warehouse updating the data that another warehouse is using to respond to queries. This major magic trick is described in their docs as follows: “Snowflake ensures that transactions executed on different virtual warehouses are globally ACID, even when the same data is accessed concurrently.”
Because of this unique separation of storage and compute, Snowflake was able to deliver a truly special feature called Snowflake Secure Data Sharing. In this blog post, we’ll cover why this matters and how you can make the most of it.
Data for Everyone
People have lots of reasons to share data. Perhaps you want to share with an external customer or a partner, or perhaps monetizing data is your whole business. In any case, there are some key points around sharing that matter:
1) Live Data vs. Snapshot Data – Yes, you could send Excel spreadsheets or CSV files around but those are typically stale the second you send them over. And with the volumes of data people are needing to share, the old-fashioned file sharing approach no longer scales. Snowflake Secure Data Sharing allows you to share live data with your consumers (i.e. partners and customers). Using the same magic trick above that allows two warehouses in your account to be coherent as data is changing, it also allows for a customer to have an instance of a warehouse with your shared data that is kept up to date and coherent as the data changes.
2) Security Model – You could give someone access to your Snowflake account by creating them a username and password. But then you’d have to be sure to give them just the right access to ensure they can perform all queries but not modify any data. Snowflake data sharing is always read-only. And Snowflake prevents your data from being shared with anyone else and disallows consumers from make additional copies.
3) Bring Your Own Warehouse – So this is where Snowflake separates itself (by miles) from everyone else. Snowflake data sharing means sharing your data with another Snowflake account (your consumers). Then your consumers create their own warehouses and assume the costs of all the queries on that data. As running complex queries could be expensive, this means that sharing data is free for the providers! And the consumer can do as little or as much as they want with it at their own expense.
Clearly, Snowflake has solved not only a difficult technological challenge with its data sharing feature, but also solved an important economic one too. This encourages more data sharing and also opens the door to more opportunities to productize and collectively benefit from data. While Snowflake charges for storage in general, with the data sharing feature, the consumer is using the provider’s storage. Consumers pay for their own compute but incur no storage costs for the shared data. As a provider you can share the data as many times as you want and it maintains a single stored copy.
Working with Snowflake Secure Data Shares
The Snowflake docs around data sharing are excellent, and I won’t repeat them. But here is a link for you: https://docs.snowflake.net/manuals/user-guide-data-share.html
We did want to share some of our observations about creating and using shares. The feature seems very well thought out. You can share either tables or something called “secure views” Secure views are very similar to conventional views (i.e. just some SQL that returns a virtual table) but marking them as secure removes the need to also share the dependent table.
Here is an example of how to use this cool feature. Let’s say you have a single facts table that contains all of your customer orders. Each customer order is associated with a specific partner, and you want to share each partner’s orders with only the appropriate partner. It’s easy to create a secure view in Snowflake that filters based on the partner name (a simple WHERE clause). When you share the secure view using the Snowflake data sharing feature to that partner, only their own rows show up. You didn’t need to share the original facts table and you didn’t need to make a copy either. You are sharing a live subset of data from a shared table, securely. Really cool stuff.
If you need help getting started with data sharing or the Snowflake Secure Data Sharing feature, reach out to us at hello@hashpath.com.