I've been working on a complex multi-model database for a few weeks now, and recently I took time to simplify and test out an idea I had on a simple key-value database. I started with the basics: A hash table in memory, a simple append-only log for persistence and durability, and the classic fsync() call after every write to the log for durability.It worked, but wasn't as fast as it could be.In Kevo, that's the approach I use, but in Klay (not public yet, but will be open sourced when ready), I'm taking a different approach. What would a database look like if you treated the individual sectors on disk as unreliable, and how could you make it as fast as possible?That's when I started reading about io_uring on Linux here (PDF) and here.io_uring... what?You can read Wikipedia as good as the next person, so let's skip ahead.The promises seem to good to be true: truly async I/O for all types of operations, not just network sockets. No more thread pools to work around blocking disk I/O, no more complex state machines built around epoll... What's the catch?Well, after doing some reading, the core insight behind io_uring clicked almost immediately. Traditional I/O interfaces force you to think synchronously--you make a system call, the kernel does work, you get a result. But modern storage hardware is inherently parallel. An NVMe SSD can handle thousands of operations simultaneously, and together, each with its own queue. The bottleneck isn't the hardware; it's the software abstraction.io_uring exposes this parallelism through a pair of ring buffers shared between your application and the kernel. You submit operations to the submission queue (SQ) and collect results from the completion queue (CQ). Instead of one system call per operation, you can submit dozens of operations with a single io_uring_submit call.My first io_uring experiment was simple: Replace my synchronous WAL writes with async ones. Instead of writing each log entry and waiting for completion, I would submit...
First seen: 2025-07-20 08:31
Last seen: 2025-07-21 00:34