Of all the pieces of today’s modern data stack, none has risen to prominence as fast as the cloud data platform, Snowflake. In this blog, we'll look at how data observability helps data teams optimize their Snowflake environments.
While part of the wave of multi-tenant cloud databases that also include Databricks, Amazon Redshift, and Google BigQuery, Snowflake has been able to stand out with a particularly developer-winning formula of easy startup costs, low ops, and instant, near-infinite scalability. In just a decade, Snowflake has won nearly 6,500 enterprise customers and $1.2 billion in annual revenue.
Many customers remain unabashedly enthusiastic about Snowflake, yet most have also come to realize, after day-to-day experience with the platform, several things:
- “Low ops” does not mean “No ops.” Fact: Snowflake delivers high availability and instant, automatic scale-up and scale-down with very little administration required. Also fact: like wearing sunscreen and flossing your teeth, monitoring your database — no matter how low-ops — is one of those good habits that prevents problems today and tomorrow. In the case of Snowflake, that can include degraded data reliability, performance bottlenecks, and cost overruns. Speaking of…
- Snowflake can be Too Much of a Good Thing. While pay-as-you-go cloud platforms like Snowflake are exceedingly inexpensive to start on, costs can quickly spiral if the proper controls are not set up. Snowflake users are especially vulnerable to bill shock, lulled into negligence by Snowflake’s “it just works” vibes, and its instant, automatic scalability. As I wrote about in another blog post, that leads to companies accidentally configuring a $7 hour-long code test to run a hundred billion times instead, and getting stuck with a $72,000 charge as a result. Or take this analytics executive with a leading e-commerce company who refers to “horror stories” around “Snowflake cost management.”
Why Data Dashboards and Reports Don’t Suffice
Chastened users have tried a number of ways to gain visibility and control over their Snowflake environments. None are fully satisfactory, however.
Most users start off trying to monitor and manage Snowflake using the Classic Web Interface. This required users to write SQL queries with results outputted onto non-graphical Excel-like worksheets. The disadvantages — lack of graphics, the lengthy time and effort to manually create queries — were obvious.
At that point, many Snowflake users moved onto other options. Python fans turned to the SnowSQL client. Still others tried to monitor Snowflake through third-party batch-based dashboards and reports from Tableau, Sigma, Looker, Qlik, etc.
All of these have their own shortcomings. Like Snowflake’s classic interface, SnowSQL is command line-based and has few visualization capabilities. APM tools have an application-centric view of infrastructure that limits their insight into data pipelines and data quality. And many of the better-known third-party dashboard templates have not been updated in several years, meaning they cannot display the full breadth of quality, performance and cost metadata that Snowflake exposes.
Adding to the existing mix of reporting tools, Snowflake announced in 2020 a replacement for its default management interface called SnowSight. Released GA in mid-2021, SnowSight brought a visual dashboard comparable to Tableau or Looker, along with several other features. Most data exploration still requires the use of text SQL queries.
Moreover, there is a common issue that all of these tools — Snowflake-created, APM, and dashboard and reporting ones — suffer from: they are ill-suited for day-to-day operational management and monitoring, being either too slow or too granular or both.
This level of detail and speed may suffice for the folks in the C-suite, but is totally sub-optimal for data engineers used to continuous real-time visibility and control over their data. Such tools don’t equip them to prevent outages, cost overruns and data errors, or be notified about such problems in real time.
And now that data drives so many mission-critical processes, businesses simply cannot afford long outages or uncontained data quality problems.
Real-World Data Visibility Problem
Here's an example. As part of its low-ops credo, Snowflake handles all management of partitions and indexes in data warehouses for users. It does this by automatically dividing large tables into micro-partitions and computing statistics about the value ranges contained in each column of data. These statistics determine which subsets of your data are required to run queries, thereby increasing query speeds.
The issue? Data migrated from a traditional partition-and-index database into Snowflake must be transformed as it is loaded. This can create data and schema errors. Even a minor issue, such as Snowflake SQL code's case sensitivity, can result in broken applications and data pipelines. Sometimes this is immediately obvious, but often it happens down the road.
Traditional data governance and reporting tools tend to only spot-check inconsistencies and test for data quality at single points in time, such as when data is ingested. Without continuous data quality validation and testing, they won’t notice when data errors crop up later.
Snowflake does offer a real-time alert tool called Resource Monitors. These are triggers that Snowflake administrators can manually set around the number of credits consumed in a particular period of time. When the limits are reached, Snowflake can notify administrators and/or suspend the operations in a data warehouse. Unfortunately, Resource Monitors can only be triggered by cost overruns — not performance issues or data reliability problems. And being an inflexible point solution also severely limits their usefulness.
A Better Way: Unified, Continuous Data Observability
Cobbling together cost-only Resource Monitors with other Snowflake or third-party dashboard tools does not provide the unified visibility and control for effective centralized real-time management of data quality, performance, and cost.
An increasing number of Snowflake users are looking for a better solution: a data observability platform that automates the continuous data validation and testing needed to create organization-wide trust in data, as well as hiccup-free throughput and optimized price-performance.
A platform like Acceldata does this by ingesting metadata available by Snowflake, gathering its own performance, cost and reliability data, and crunching them together to generate its own set of operational intelligence around your Snowflake environment.
This enables Acceldata not just to create visually-rich dashboards for human administrators, but also proactively detect anomalies, generate predictive models, and automatically create alerts and data management recommendations.
For instance, Acceldata’s spend intelligence dashboard aggregates usage across all Snowflake services to assign a real dollar value (rather than just credits used). Admins can view high-level cost trends for different types of service, such as Compute, Storage, and Clustering, and different time periods. They can also drill down into particular days or databases or data tables where high costs were incurred.
But Acceldata goes further — much further. Our Compute Observability analyzes what your average compute usage is in order to automatically flag administrators if workloads — and costs — spike. In other words, rather than forcing you to manually create triggers and alerts as Snowflake’s Resource Monitors do, Acceldata creates them for you. And as your usage patterns change, Acceldata updates these triggers for you.
Moreover, Acceldata’s capacity planning feature helps you predict your cloud resource consumption. This lets you see what is being over-utilized and under-utilized, so that you can get the most out of your cloud contracts and avoid bill shocks.
In the area of performance monitoring, Acceldata alerts admins if any of these usage spikes are causing timeouts in Snowflake, while also identifying the unexpected accounts, warehouses, and workloads that may be to blame. This enables you to immediately investigate problems and shorten your Mean Times To Resolution. Such real-time performance monitoring is not available via Snowflake Resource Monitors.
Data reliability is another area where Acceldata offers more than either Snowflake Resource Monitors or visual-but-passive dashboards. Acceldata automatically discovers all Snowflake datasets and creates a profile of all the data, including their structure, metadata, and relationships, including dependencies and lineages. Using those profiles, Acceldata then starts offering ML-powered recommendations to Snowflake administrators to streamline the creation of data quality policies and rules. For instance, Acceldata can recognize that the data in a particular column should be binary (“yes” or “no”) and free from null values. It will then recommend that rule be added to your data quality policy.
Acceldata will also make recommendations around duplicates, uniqueness, pattern matching, range validation, schema checks, and more. What used to take hours of effort can now be finished in minutes with a few clicks.
Finally, Acceldata applies these data quality rules to run continuously on the schedule you configure — not just when data is first ingested into Snowflake. This helps maintain data reliability as data is transformed or combined repeatedly over time. Acceldata also automatically cleans and validates incoming real-time data streams from Apache Kafka and Spark that are so commonly connected to your Snowflake Data Cloud. All incomplete, incorrect and inaccurate data is flagged in real time without requiring manual intervention. This keeps data flowing, and reduces data downtime to a minimum.
Finally, Acceldata allows admins to define segments of data where data quality rules should be applied or applied more often. This allows you to de-emphasize or skip low-priority columns, tables, or entire data warehouses, saving on your Snowflake processing time and budget.
Read our whitepaper, Increase Your Snowflake ROI with Data Quality, Resource Efficiency, and Spend Forecasting, to learn more.