Coding-Challenge

The importance of avoiding unnecessary work

Reading the Right Data Dominates Query Runtime

Analytical query processing powers modern data-driven decision-making, turning massive datasets into timely, trustworthy insights. To keep up with ever-growing data volumes, many systems turn to cloud object storage, which allows decoupling compute and storage and promises infinite scalability. Large data sets can be subdivided into smaller “blocks” which contain a subset of tuples, each stored as a separate object.

When analyzing query performance, most would expect joins or aggregations to dominate runtime, especially since analytical queries are often very complex. Surprisingly, that is not the case, instead the most time-consuming operator is seemingly simple, scanning and filtering data, which accounts for roughly 50% of the total runtime.