|
I probably know who you are referring to, but I don't feel it would be appropriate to discuss a specific identifiable individual in a public forum, no matter how innocuous that discussion might be! My fault for bring it up. Cassandra was super easy to shoot yourself in the foot with, and it remains quite easy. You mention a few foot guns that have gotten better, a few that remain but that will get better, and a few that are sort of inherent to distributed databases. Anti-compaction for instance should now not be a huge issue if you're running regular incremental repairs, and I hope even the few remaining caveats will be alleviated soon. Bootstrap is something you mention that is also going to get much easier for users this coming year, so that unsophisticated users can manage their cluster membership safely. Application-misconfigured consistency levels is a really obvious one that isn't strictly Cassandra's fault but for which much better help could be given to the user, and I expect some major improvements here in the next year or two, so that users can configure tables with consistency properties that the database guarantees (to some extent, the user will always be able to screw it up by providing the wrong consistency identifier, but at least the scope is reduced to accidental misconfiguration rather than misunderstanding). This is something we're considering as part of the introduction of general purpose transactions later this year. Poor data models and partition keys are things the database can offer less help with, though I anticipate much better support for ordered partitioning in future, that would help poorly-selected partition keys, as clustering keys can be used for partitioning there too. Regarding the choice of language, Java has upsides and downsides. GC spirals are something we have control over at the end of the day, and we continue to do better at (as does the JVM), but guaranteeing no segfaults (and not worrying about the ABA problem) is a big benefit we get in return. I wish we had more control over things like memory placement and execution, but these things may be coming to the JVM to some extent (Loom I expect to benefit Cassandra hugely, and value types later), but equally distributed systems problems often give you enough things to worry about. The visibility you have into a Java process is fantastic, however, and we are starting to make use of the ease with which you can modify the code Java runs for system validation, using byte weaving to permit us to simulate clusters as they are run, with adversarial event orderings, to ensure those notoriously hard distributed systems problems are correctly solved. If you do want to speak privately, about anything Cassandra related, the lowercase part of my username (i.e. without the _ prefix) at apache.org reaches me. |
Also, sorry, I think I was a bit unfair to Java. I'm not an anti-GC militant. I'd be the first person to point out the haziness of the distinction between tracing and (say) reference counting in the first place, or indeed with the indexing/defragmentation/etc space + work required of a malloc implementation. I'd consider a GCed language like Go - though I personally hate it and feel it utterly joyless - to be an improvement. It's more about the inherent complexity of adding a p-code machine like the JVM on top of the already-colossal complexity of a modern database. For what it's worth, for clarity, I've barely written any Java and I'm intimately unfamiliar with Java development, and despite giving my opinion I'm well aware it's not a very informed one. I do agree with your point about its isolating the 'unit' of your software from the particularities of any given hardware and making it more easily jepsenable - I hadn't considered that. And some of the stuff happening in the Java space, like Graal and (as you say) Loom, is very impressive.
I'll amend my original comment to make it a bit clearer that most of this is not really Cassandra's fault, and its faults aren't really more numerous and more severe than those of any other database. It's evidently a huge success and its value to people is undeniable - I don't mean to depreciate your work. I forget that there's a non-negligible chance of relevant people reading my comments on here (like, less congenially, the time I accidentally summoned Br*ndan E*ch: https://news.ycombinator.com/item?id=28792436). I don't work with Cassandra any more, so I'm probably unlikely to have many practical questions, but thanks for the offer and I'll certainly reach out if I find myself in that space again! Really appreciate your being so magnanimous about my not-very-magnanimous (pusillanimous?) comment.