Hacker News new | ask | show | jobs
by cbsmith 3929 days ago
> scylla for instance does not leave the cache to the OS. It has its own caches for everything

Uh-huh... that's all pretty common for databases. Cassandra would fit that description.

> Never blocks on IO or page faults because all IO bypasses the kernel.

That just seems nonsensical. Sometimes, you are waiting for IO. That's just reality. It is conceivable you bypass the kernel for I/O, but that creates a lot of complexity and limitations. Near as I can tell though, they do talk to the kernel for IO.

2 comments

Cassandra has a row cache that it does not necessarily use. Most of its data sits in the linux page cache. Because the SSTables are mmaped into memory, you can't get rid of that: even if you do use the row cache, you would be at best using twice as much cache memory.

Scylla never touches the page cache. All IO in seastar is direct IO and then scylla caches everything itself. We always know when the disk access is going to happen. The OS paging mechanism does not do a thing.

As for waiting for IO, of course IO does not complete immediately. But you can either block and wait for it, as Cassandra does (it doesn't even have the option not to in the case of the mmaped regions) or you can do something fully async like seastar that guarantees you never block waiting for IO.

http://www.scylladb.com/technology/memory/

By the way, I think you're replying to one of the devs of Scylla.

So, in general, I understand there is lots of stuff going on in Scylla that does distinguish it, at least from Cassandra. There is the user space networking logic for IO. However, a lot of the IO overhead with disk, for example.
>However, a lot of the IO overhead with disk, for example.

That's why they benchmarked this workload on a 4x SSD RAID configuration :). Given that i/o bandwidth and throughput continues to increase, processor frequency isn't, and core counts are going up, it's prudent to design a system that can take advantage of this.

Yeah, and a 4x SSD RAID configuration is kind of overkill in the extreme for most Cassandra set ups.

I'm sure there is a way to set up IO subsystems so that Cassandra becomes a huge bottleneck, but that's a pretty specialized context.

I'm sure I am.