Why You Need To Know Snowflake As A Data Scientist
Updated: Oct 12, 2019
The only data warehouse built for the cloud
Maybe this is your first time to hear this company — Snowflake.
Maybe you heard of this name somewhere but still are not sure what exactly Snowflake does as a data warehouse and what makes it different compared to other platforms.
Well… Guess what?
I just started knowing more about Snowflake recently after Gartner has released its 2019 Data Management Solutions for Analytics Magic Quadrant (MQ) report and named Snowflake one of the Leaders.
This came to my attention and I started to learn more about it.
If you’re in data science field and writing SQL queries to get data from data warehouse (or databases) is your day-to-day work, then this article is written for you from the perspective of a data scientist.
By the end of this article, you’ll know more about Snowflake and some of its key features and how it’s slowly changing the game as a data warehouse.
Let’s get started!
Snowflake is a full SQL data warehouse built from the ground up for the cloud.
In fact, its architecture is what differentiates it from other platforms.
It delivers the flexibility and efficiency that simply isn’t possible with a traditional data warehouse or big data platform that has been shifted to the cloud.
If you want to know more about what architecture that Snowflake has and how it combines the power of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions, I strongly encourage you to check out the video below.
Key Features of Snowflake
By now you should have known that Snowflake’s patented architecture is separated into three layers — Storage, Compute and Services.
This is very different compared to traditional data warehousing that suffers from rigid data modelling and inflexibility.
1. Storage Layer
This is the place where all data is stored in a centralized manner.
Snowflake manages all aspects of how this data is stored — the organization, file size,
structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake.
The data objects stored by Snowflake are not directly visible nor accessible by customers. They are only accessible through SQL query operations run using Snowflake.
What’s even more amazing is that the storage layer is engineered to scale completely independent of compute resources.
What this means is that Snowflake has the ability to process data loading or unloading, without impacting running queries and other workloads.
2. Compute Layer
The compute layer is designed to process enormous quantities of data with maximum speed and efficiency.
All data processing muscle and horsepower within Snowflake is performed by virtual warehouses — compute engines — which are one or more clusters of compute resources.
When performing a query, virtual warehouses retrieve the minimum data required from the storage layer to satisfy queries.
As data is retrieved, it’s cached locally with computing resources, along with the caching of query results, to improve the performance of future queries.
Even better, multiple virtual warehouses can simultaneously operate on the same data while fully enforcing global system-wide transactional integrity with full ACID compliance!
3. Services Layer
Imagine the compute layer (that we talked about just now) is the brawn of Snowflake, then the services layer is the brain that controls the compute layer.
The services layer for Snowflake authenticates user sessions, provides management, enforces security functions, performs query compilation and optimization, and coordinates all transactions.
The services layer is constructed of stateless compute resources, running across multiple availability zones and utilizing a highly available, distributed metadata store for global state management.
Essentially, you don’t need to worry about your queries being robbed of compute resources as the services layer performs transaction coordination across all virtual warehouses.
Thank you for reading.
What I’ve just described is simply the brief overview of the architecture used by Snowflake that’s built for the cloud.
By now I hope you’ve learned more about Snowflake with some of its key features.
If you’re interested in learning other key features (which I strongly recommend!), check out their website here!
If you have any questions or comments feel free to leave your feedback below. Till then, see you in the next post!
About the Author
Admond Lee is the CEO & Co-Founder of Heralytics. He is known as one of the highly sought-after data scientists and consultants in helping various companies and digital marketing agencies tackle their problems using data with strong expertise in data science consulting and industry knowledge.
His insights on data science have been featured by various publications, including KDnuggets, Medium, Tech in Asia, AI Time Journal and business magazine.