Hacker News new | ask | show | jobs
by nvarsj 3031 days ago
So why do people keep building latency sensitive things in the JVM? And then they manage to get hugely popular?

Cassandra is a constant struggle with the GC. I’d guess the cost of running it is at least an order of magnitude greater compared to if it had been implemented in c++ or something more sensible.

5 comments

To be fair, a lot of big companies know how to tune the JVM. A TON of HUGE companies write a LOT of java. What you consider a constant struggle, a lot of very large companies consider trivial.
I'm not sure it's trivial. Tuning the JVM is an entire cottage industry. JVM performance experts can make 1000+/day tuning the JVM and are in high demand. Companies spend huge amounts of engineering effort to keep the JVM running smoothly. I used to be involved in this side of things pretty heavily at a HFT firm, which almost exclusively used Java.

In my opinion it's a colossal waste of resources. Classic example of using the wrong tool for the job.

And it's still cheaper to hire a guy with a skill like that for a few weeks, or even keep him permanently - and keep a larger development team of cheaper C# or Java devs, than it is to replace them all with higher-payed C/C++ devs which would probably take longer to get the same functionality up & running.
Good Q. You might like to check ScyllaDB written in C++, which is supposed to have considerably better performance than Cassandra (also low tail-latency) and a level of compatibility with it: https://www.scylladb.com/
Many of these open-source databases started as internal projects inside big companies, where Java/JVM allowed for more productivity and cross-platform deployment with more skill reuse of the team. Then they grew from there and now it's too late to rewrite the whole thing.

If you were starting a database-focused company from the beginning than choosing C++ is a better decision, which is exactly what ScyllaDB has done with their cassandra clone. Along with general algorithm and decision improvements, it'll provide 10-100x the same performance at lower latency on the same server.

We're also starting to see more projects written in Go now, which is still a managed runtime but usually better at handling these kinds of low-level systems.

Go is a much better choice for systems work. Largely because the GC has a low pause (sub ms) target. I'd still be hesitant to use it for very latency sensitive things, or memory intensive applications. Prometheus, for example, has struggled with golang's memory management (bad memory fragmentation, wasteful memory usage). But I think it's a great compromise if you don't want to deal with memory management.
Sure, I would also .NET/C# to the list now.

.NET Core on Linux is very fast and there are some great developments around fast low-level (yet managed) managed memory manipulation that can lead to some very fast software.

OpenJDK/Hotspot is not the only game in town. Those who really need low latencies can opt to use other JVMs (some commercial) with GCs that provide very low pause times, usually at the expense of some percentage points of throughput. In large corporate environments that might not be a problem.
Apparently these people enjoy GC/JVM languages more than C++.
GC languages like Java is much easier to write, and can be made performant when required.
Yes, this is the reason why Java is chosen. But I feel pretty strongly that databases are system engineering problems, and should be written using a proper systems language.

Something like Java makes implementation easier, but operation more difficult and costly.