| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by samhw 1465 days ago

His initials were LT, if that helps (if not, I can clarify his whole name over email or however people privately communicate on HN?). I may well be exaggerating his 'major'ness - I'm not that familiar with the Cassandra contributorship, but that was my understanding from my team!

As for Cassandra itself, we were certainly quite sophisticated users. There's a decent chance you'd know the company in question. It's a fair point about the language: I'm not a fan of Java and it probably colours my opinion a little; I was speaking more from a philosophical standpoint about reducing the theoretical complexity of software to make it more deterministic & understandable (out of the tar pit and all that), more than to any specific deficiencies in Java that caused any actual issues for us, of which there were no direct examples I can recall. (Pathological GC did cause some occasional degradation, I suppose, though at best that's semi-specific to Java.)

I think most of the actual issues, from my on-call years, were as a result of stuff like: (as I mentioned) anticompaction and suchlike causing pathological performance; our own misconfiguration of things like asynchronous replication / bootstrapping (which once caused a very severe incident, as in 'endemic data corruption and loss' severity); application-layer issues from product engineers misconfiguring consistency, choosing poor keys for partitioning, constructing poor data models that require table scans, all the ordinary stuff for which Cassandra is at most very obliquely to blame.

Also, I agree it was stupid of us to use Cassandra in (what was) a very serious environment, in probably one of the most safety-critical sectors outside of medicine and rockets. We knew that. We did the same with several technologies. Literally, we had a diktat saying no engineers could mention it in blog posts. On reflection it's quite unfair to blame Cassandra for our decision - to a large extent, yeah, we were holding it very wrong. I would not have made that choice myself, at all.

1 comments

_benedict 1465 days ago

I probably know who you are referring to, but I don't feel it would be appropriate to discuss a specific identifiable individual in a public forum, no matter how innocuous that discussion might be! My fault for bring it up.

Cassandra was super easy to shoot yourself in the foot with, and it remains quite easy. You mention a few foot guns that have gotten better, a few that remain but that will get better, and a few that are sort of inherent to distributed databases.

Anti-compaction for instance should now not be a huge issue if you're running regular incremental repairs, and I hope even the few remaining caveats will be alleviated soon. Bootstrap is something you mention that is also going to get much easier for users this coming year, so that unsophisticated users can manage their cluster membership safely.

Application-misconfigured consistency levels is a really obvious one that isn't strictly Cassandra's fault but for which much better help could be given to the user, and I expect some major improvements here in the next year or two, so that users can configure tables with consistency properties that the database guarantees (to some extent, the user will always be able to screw it up by providing the wrong consistency identifier, but at least the scope is reduced to accidental misconfiguration rather than misunderstanding). This is something we're considering as part of the introduction of general purpose transactions later this year.

Poor data models and partition keys are things the database can offer less help with, though I anticipate much better support for ordered partitioning in future, that would help poorly-selected partition keys, as clustering keys can be used for partitioning there too.

Regarding the choice of language, Java has upsides and downsides. GC spirals are something we have control over at the end of the day, and we continue to do better at (as does the JVM), but guaranteeing no segfaults (and not worrying about the ABA problem) is a big benefit we get in return. I wish we had more control over things like memory placement and execution, but these things may be coming to the JVM to some extent (Loom I expect to benefit Cassandra hugely, and value types later), but equally distributed systems problems often give you enough things to worry about.

The visibility you have into a Java process is fantastic, however, and we are starting to make use of the ease with which you can modify the code Java runs for system validation, using byte weaving to permit us to simulate clusters as they are run, with adversarial event orderings, to ensure those notoriously hard distributed systems problems are correctly solved.

If you do want to speak privately, about anything Cassandra related, the lowercase part of my username (i.e. without the _ prefix) at apache.org reaches me.

link

samhw 1465 days ago

Yeah, I 100% agree that many of those problems are inherent to distributed databases. There are an interesting few which kinda straddle the line in that respect – stuff like counters and the aforementioned individually tunable consistency, where it makes it too easy for (in practice) individual engineers to trigger classic dist-sys failure modes – but largely its problems are the problems of distributed systems. Lots of its other problems are the problems of using an over-complex eventually-consistent write-optimal (etc) distributed system for a problem that doesn't require it (where e.g. Redis Cluster would be far better). I'd submit to throw maybe a few on top: it feels like a general theme of many incidents we encountered was around "we did something complex/accidentally-pathological and Cassandra froze up entirely due to [consistency / compaction / repair / GC] stuff". It did feel from many of those issues like it was a victim of its own complexity, more than anything else. (That theme also applies to lots of the 'operator errors'.)

Also, sorry, I think I was a bit unfair to Java. I'm not an anti-GC militant. I'd be the first person to point out the haziness of the distinction between tracing and (say) reference counting in the first place, or indeed with the indexing/defragmentation/etc space + work required of a malloc implementation. I'd consider a GCed language like Go - though I personally hate it and feel it utterly joyless - to be an improvement. It's more about the inherent complexity of adding a p-code machine like the JVM on top of the already-colossal complexity of a modern database. For what it's worth, for clarity, I've barely written any Java and I'm intimately unfamiliar with Java development, and despite giving my opinion I'm well aware it's not a very informed one. I do agree with your point about its isolating the 'unit' of your software from the particularities of any given hardware and making it more easily jepsenable - I hadn't considered that. And some of the stuff happening in the Java space, like Graal and (as you say) Loom, is very impressive.

I'll amend my original comment to make it a bit clearer that most of this is not really Cassandra's fault, and its faults aren't really more numerous and more severe than those of any other database. It's evidently a huge success and its value to people is undeniable - I don't mean to depreciate your work. I forget that there's a non-negligible chance of relevant people reading my comments on here (like, less congenially, the time I accidentally summoned Br*ndan E*ch: https://news.ycombinator.com/item?id=28792436). I don't work with Cassandra any more, so I'm probably unlikely to have many practical questions, but thanks for the offer and I'll certainly reach out if I find myself in that space again! Really appreciate your being so magnanimous about my not-very-magnanimous (pusillanimous?) comment.

link

_benedict 1465 days ago

Please, no need to apologise! Your criticisms were all entirely well founded and the pain points you mentioned very real. I thought you expressed them considerately (I have seen plenty of vents that did not). I may reflexively defend Cassandra, but healthy and honest discussions around these things is great IMO.

> It did feel from many of those issues like it was a victim of its own complexity

I think there’s some truth in this, but I think the bigger problem was failing to give this complexity its proper respect (which would have been very costly and slowed feature development - perhaps consigning Cassandra to an also-ran position like others in the space, who knows?)

link