Beginner

The importance of avoiding unnecessary work

Reading the Right Data Dominates Query Runtime

Analytical query processing powers modern data-driven decision-making, turning massive datasets into timely, trustworthy insights. To keep up with ever-growing data volumes, many systems turn to cloud object storage, which allows decoupling compute and storage and promises infinite scalability. Large data sets can be subdivided into smaller “blocks” which contain a subset of tuples, each stored as a separate object.

When analyzing query performance, most would expect joins or aggregations to dominate runtime, especially since analytical queries are often very complex. Surprisingly, that is not the case, instead the most time-consuming operator is seemingly simple, scanning and filtering data, which accounts for roughly 50% of the total runtime.

Why databases found their old love of disk again

“640K ought to be enough for anybody.” - Bill Gates claims to have never actually said that. 640KB is definitely not enough to hold most data sets in memory. That’s why the assumption made in old database systems is that you have to access the disk for basically every operation. Since HDDs were essentially the only available storage medium for databases at the time (other than tape), the cost of I/O accesses dominated database performance. The cost decrease and capacity increase of RAM and the advent of flash storage introduced new opportunities for database systems. Let’s explore the different storage options.

The Case for a Unified, JIT Compiling Approach to Data Processing

Over the past 12 years, Just In Time Compilation for SQL query plans (pioneered by Thomas Neumann at TUM) gained popularity when it comes to developing high-performance analytical database management systems. The main idea sounds simple: the system generates specialized code for an individual query and avoids the overhead of interpretation of traditional query engines.

LingoDB, a new research project from Michael Jungmair at TUM, aims to enhance the flexibility and extensibility of this approach drastically. It pursues this goal in two ways: