Hacker News new | ask | show | jobs
by SMFloris 1939 days ago
A while back we explored the use of Cassandra. We wanted to keep some event related data there and for it to be relatively fast read-wise in order for us to do all sorts of reporting based on it. So we wrote allot and wanted to read fast. Seemed like a perfect store for our timestamped events, especially since we wanted to not even use deletes and has in-build record deduplication via its primary key. Turns out, it is not that perfect.

Other than what the article described, I can also add:

1. It has a steep learning curve, but you do get to see the advantages while you learn it. But then, everything comes crumbling down.

2. The setup is a pain locally. Then it is a pain to set it up in prod and manage it. The tooling itself feels very unfinished and basic.

3. No querying outside primary index on AWS Keyspace if you want it managed. Also, any managed variants are EXPENSIVE. I mean, every database is fast if you only query by the primary index so why pay extra?

It is just not worth it. For example, we winded up using MongoDb and it turned out to be fast, scalable, had mature tooling and we can keep tons of event related metadata in it and it is easy to manage and doesn't cost a fortune.

2 comments

I think Cassandra is a better fit for interactive use cases, not for reporting. Also, basically it's super heavy duty and it should start to shine when you're really serving entire Internet (on the scale of Reddit, Expedia etc.) and your Cassandra cluster is distributed across DCs across the world.

I haven't really worked in this space for a couple of years so I don't know if the cloud offerings have already completely matched Cassandra's features and robustness.

Dynamo allegedly has global replication now.

Of course, I have been totally unable to determine how they merge rows/partitions without cell timestamps. It's a black box.

I was just on a "Keyspaces" meeting where the sales dude basically described dynamo billing, dynamo provisioning, feature shortfalls obviously due to dynamo, but would not admit it was dynamo.

It was bizarre.

We initially looked at Cassandra. We liked its use case, we liked its scalability. However we also ran into maintenance and setup pains.

We ended up going with ScyllaDB, which is a drop in replacement for Cassandra. It’s written in C. Much easier in resource demands and we didn’t have to deal with Zookeeper directly.

Cassandra doesn't require Zookeeper
Ah, good to know. Our admins set up Zookeeper with Cassandra, so I had always assumed it was part of the deal.
And Scylladb may have a better thread-per-core model and no GC pauses, it basically has the exact same management challenges as Cassandra.

The parent comment almost seems generated by AI.

Dumb statements all around, but clearly you've never been in a Cassandra environment vs Scylla. Scylla is far, far more reliable, easier to get up and running, and required a bit less supervision than Cassandra.