Now a day’s everyone is hearing the term Snowflake. Is it some kind of Snow? Not at all… everyone is wondering what is a snowflake? Before we going to the details of snowflake, let me explain what is a cloud data warehouse?
Now a day’s most companies are looking for cloud solutions rather than setting up their on-premises infrastructure. For example, if you want to set up or store your data, basically you will go for traditional databases. So you first need to set up commodity hardware and then you start installing your application (Oracle, MySQL, Java, etc. ..) and then start loading the data. To set up this type of infrastructure, it will take some time to get approval or set up a Linux box or installing an application, etc.
So Cloud is something that is readymade, on behalf of you, they will host all these servers and compute instances. Once you buy this or once you get the subscription to this snowflake account you can start loading and unloading the data since day1 itself. So you will not store any data on commodity hardware, simply snowflake will take care of that. So they will store the data on the cloud.
So to Sum up, Snowflake is a cloud data warehouse unlike traditional databases and data warehouses which runs on an on-premises system. Snowflake runs on cloud infrastructure and infrastructure comes from any of the three cloud providers. (AWS-Amazon web service, Microsoft Azure, and Google cloud). Snowflake is in high demand because of its niche features such as time travel, fail-safe, data cloning, data sharing, etc. The important aspect of Snowflake is we pay for what we use. Snowflake charges the customer based on the compute and storage cost. This means Snowflake separates the storage cost from the compute cost. Every company is migrating its business to snowflake because of its niche features.
To understand snowflakes in detail, we should have a clear idea of snowflake architecture. Some of the features of snowflake such as time travel, fail-safe, drop, zero-copy cloning, etc dependent on this architecture. If we understand the architecture better, we will understand how processing is happening in the backend. If we know the process In the backend, we will be cautious while writing the queries. This will help us to save a lot of cost and time while we are using snowflakes. We should know that for every query we execute in snowflake, it will be charged. Understanding snowflake architecture helps you to save a lot of costs.
Snowflake is a Data warehouse that runs entirely on cloud infrastructure and can not be run on a private cloud or a hosted infrastructure. It is primarily available on AWS(Amazon Web Service) and azure(Microsoft Azure) and GCP (Google Cloud Platform). We should understand that snowflake is not a relational database. So it doesn’t have any primary key foreign key constraint. But it offers snowflake SQL such as DDL/DML, SQL functions. Snowflake also allows us to create User-defined functions and stored procedures using java scripts.
Let us discuss the architecture of the snowflake next. So Snowflake architecture consists of three layers. They are the Data Storage Layer, Virtual Warehouse Layer, and Cloud Service Layer.
Data Storage Layer: This is the bottom layer of Snowflake architecture. This layer stores all table data and query results. If we host our snowflake in AWS(Amazon Web Service), the data is stored in AWS S3. If we store our snowflake in Microsoft Azure, the data is stored in Azure Blob storage.
Virtual Warehouse Layer: This is the middle layer of Snowflake architecture. This layer handles query execution within elastic clusters of Virtual machines. This layer also called the muscle of the system.
Cloud Service Layer: This is the top layer of Snowflake architecture. This layer is the collection of all the services. This layer is also known as the brain of the system. This layer consist of Authentication and access control, infrastructure manager, optimizer, Transaction manager, security, and metadata storage.