Accelerate Time-Series Analytics with RANGE-Based Window Frames, now Generally Available

Posted on August 8, 2024August 8, 2024 by cloudmatrix.website

Addressing Time-Series Analytics Challenges

At Snowflake, we are committed to helping customers derive meaningful insights from data with simplicity and speed. That’s why we’re thrilled to announce the general availability of our enhanced RANGE-based window frame, which will help accelerate time-series analytics.

For customers across industries, the ability to perform time-series analytics is crucial. This often involves calculating rolling statistics that remain robust despite gaps or nonuniform time-series data. Regardless of the reason for gaps in data, analytics must account for them to produce more accurate results.

Our solution

We recognize the significance and prevalence of these use cases. So we have invested in solving them by extending our support for RANGE BETWEEN sliding window frames to INTER VAL and unsigned numeric constants to define explicit offsets.

Deep dive into Snowflake’s extended RANGE-based frame support

This can now be easily accomplished using Snowflake’s support for INTERVAL in the RANGE-based window frame, as demonstrated in the pseudo-SQL below. With this functionality, customers can avoid the cumbersome preprocessing of their data to eliminate gaps. Instead, they can focus on the crucial task of extracting valuable insights from their data.

How does our solution work?

Without the native support for RANGE-based window frames, customers typically use workarounds, such as performing an aggregate function with a range join or preprocessing their data to be uniformly distributed to align with row-based window frames. These workarounds are often less efficient and can become tedious to maintain.

The new functionality we offer is easy to use and complies with the SQL standard. It applies partition-aware sorting; efficiently computes the variable-sized window and corresponding output for each row; and spills to local/remote storage, as needed, when certain windows are huge.

To illustrate performance gains from using RANGE-based window frames, we used sample data sets with uniformly distributed time series and 22 million rows, which were divided into 5K partitions. The RANGE-based window frame turned out to be 6x faster than the “aggregate + range join” workaround. On scaling the data size by 10 times to 220 million rows, the RANGE-based window is 9x faster than the workaround. In summary, RANGE-based window frames provide a simple and efficient way for computing rolling statistics in real-world time-series use cases that usually contain data gaps.

What are customers using RANGE-based window frames for?

Across the board, the RANGE-based window frames are being adopted rapidly by customers for varied types of use cases. Noted below are a few popular use cases across industries:

Manufacturing customers are using them for use cases, such as resource monitoring, equipment performance analytics and supply chain analytics.

Financial services customers are using them for asset performance tracking, time-based transaction analytics for fraud detection and loyalty program analytics.

Retail companies are using them for use cases such as demand planning, rolling inventory analytics, sales allocation and customer spend analytics.

Telecommunications companies are using them for, among others, call volume analytics and issue monitoring and detection.

Irrespective of the industry, we have seen that the functionality is being used for common operations, such as user analytics, customer spend analytics, and sales and campaign monitoring.

What’s next?

The feature is now available in all Snowflake accounts. You can start using this functionality for your relevant analytics and even migrate your existing workarounds to this performant, easy-to-use solution. To learn more details, such as the list of window functions supported, you can read the Snowflake documentation. You can also refer to the time series user guide to learn more about working with time-series data in Snowflake.