Hacker News new | ask | show | jobs
by jjirsa 3032 days ago
Nicely done! Looking forward to the pluggable storage engine.
2 comments

The JIRA tickets don't really shine with much hope :/

https://issues.apache.org/jira/browse/CASSANDRA-13474 [2 comments from 2017 Apr] https://issues.apache.org/jira/browse/CASSANDRA-13475 [~100 comments, but the last one is from 2017 Nov, by the InstaG engineer]

And the Rocksandra fork is already ~3500 commits behind master, so upstreaming this will be interesting.

Oh, and the Rocksandra fork is already kind of abandoned - no commits since 2017 Dec. (which probably means this is not actually the code that runs under Instagram.)

This is the rocksandra branch, https://github.com/Instagram/cassandra/tree/rocks_3.0, we develop it on top of Cassandra 3.0. It's the code we are running on our production servers.
Thanks for the git push and the reply! A few minutes ago it was still pointing to the older commit.

And upstream Cassandra is already 11 minor releases away. Won't that become a problem with something as fundamental/low-level as pluggable storage engines?

I'm a committer, I'm familiar with the JIRA ticket.
Could you share your thoughts on how likely and how soon will the RocksDB engine be available as part of normal Cassandra? Also, how big is the gap between 3.0.x and 3.x? Any improvements between 3.0.x and 3.x regarding tail latency/performance? Thanks!
I think it's likely. It's decidedly nontrivial, and the hardest part will be the (very slow) design phase where we actually make sure the interfaces are defined properly, but I think there are enough interested people to make sure it happens.

There are some meaningful changes between 3.0 and 3.11 (notably a compressed chunk cache for storing some intermediate data blocks and a significant change to the way the column index is deserialized) that do help tail latencies, and there's certainly quite a bit more low hanging fruit, but the biggest contributor to p99 latencies is the GC collections, and the read path still contributes the most JVM garbage, so this is still probably a meaningful improvement over 3.11.

It would be great. But I don't think it could happen. The pluggable storage engine would greatly increase the cognitive complexity of the code.
Of course it could happen. Pluggable engine has a lot of positives - not only does it enable features like this, it also helps modularize the codebase making it more testable, and there are other people who will develop storage engines for their own use case over time (look at the evolution of - for example - MySQL storage backends for examples of this).