Continued Investments in Price Performance and Faster Top-K Queries

As we describe in this blog post, the top-k feature uses runtime information — namely, the current contents of the top-k elements — to skip micro-partitions where we can guarantee that they won’t contribute to the overall result. We saved quite a bit of input/output operations (I/Os) and improved the performance of queries quite significantly. With our recent optimizations added to top-k, we managed to further reduce the number of I/Os, improving the performance of queries quite significantly.

Architectural difference: While our first version still required one I/O per partition, to retrieve the partition’s metadata and then decide whether it can be skipped, now, top-k utilizes Snowflake’s advanced metadata layer to embed relevant metadata directly into the query itself, and therefore requires no additional I/Os when deciding whether to skip partitions. The benefit of this can be seen especially on large tables: Let’s say we have a table with 1 million partitions clustered by a timestamp column. We run a query with an ORDER BY timestamp LIMIT 10 on the table. Previously, this would require at least one million I/Os, plus the I/Os of the partitions we actually need to scan, to construct the result. Now, we only need a handful of I/Os for the partitions that we actually need to load. 

Algorithmic perspective: From an algorithmic perspective, we implemented a way to process partitions in a smart order, which further reduces the number of I/Os. Before Snowflake starts executing the query, we look at the metadata of the partitions to determine whether the contents of a given partition are likely to end up in the final result. Snowflake starts processing those partitions first. This implies that the initial top-k elements we see after starting the query are already some of the largest/smallest rows, which will also likely end up in the final result. Combining this with the skipping technique described in the June 2023 blog post, we are able to skip even more partitions compared to a random order of scanning partitions.