Michael Zinsmeister
In the first and second part of our series about modern database storage we have already learned about the history of of our series about modern database storage we have already learned about the history of database storage technology and about pointer swizzling as a first modern buffer management technique. I have also described some of the drawbacks of pointer swizzling, mainly that it leaks into your entire system because you cannot have a nice clean buffer manager API with loose coupling that just dereferences page ids to pointers.
In the first part of this series about modern database storage we have already seen how SSDs have been becoming the go-to base storage for modern database systems more and more in the late 2010s and especially now. As long as most of the data that most of your queries are touching fits into main memory, SSDs can easily handle the rest at a very small performance cost.
The approaches
There are multiple approaches to achieving in-memory speeds for in-memory cases and utilizing the SSD well in out-of-memory cases relying on fast NVMe SSDs. To understand what characteristics those approaches need, we must first understand the problem with traditional buffer management. After that, we will present some potential solutions to this problem. This will be split up between this post and another one coming soon.
“640K ought to be enough for anybody.” - Bill Gates claims to have never actually said that. 640KB is definitely not enough to hold most data sets in memory. That’s why the assumption made in old database systems is that you have to access the disk for basically every operation. Since HDDs were essentially the only available storage medium for databases at the time (other than tape), the cost of I/O accesses dominated database performance. The cost decrease and capacity increase of RAM and the advent of flash storage introduced new opportunities for database systems. Let’s explore the different storage options.