Hacker News new | ask | show | jobs
by trashtester 620 days ago
JVM based development has a place for some teams. Others need to go all the way to C++/C/Rust etc for the performance they need.

But plenty of tasks can be done beautifully in Python. That's especially true in a data processing or ML setting where most of the heavy lifting is done in libraries such as numpy, spark, pytorch etc. (Also Python is the industry standard for such teams).

Still, even for such teams there are times where you want to do SOME heavier compute tasks within the core language, and offloading this to some other dev team simply doesn't scale.

The solution is to use multi processing instead of multi threading. But this workaround is quite inflexible.

Some dev teams may have developers that can deliver this in scala (especially if spark is involved). Other may have the ability to build C++ (or CUDA) libraries to add to the python environment.

But the ability to run somewhat heavier processing than what can be achieved by a single process is often much better.

Cost wise it also makes a lot of sense. Such steps may often find themselves on some large compute cluster where you have tens or hundreds of processors (or more) available. If a single step in a processing pipeline on such a cluster can be cut from 2 hours to 1 minute, it can be a large saving. Taking it from 1 minute to 10 seconds means a lot less.

Btw, and with all due respect, the part about teams that require perf using JVM doesn't really match my experience. Where I come from, the Java devs tend to produce the slowest code of all, mostly because every step of the processing is serialized/deserialized as microservices talk to each other for each data element.

Even python based code is often faster (sometimes by order of magnitudes). Partly because of cultural differences between teams (the python code, even when exposed as microservices, tend to work with larger blocks of code. And partly because the really is processed in C++ based libraries within python, that still have a significant edge on JVM based code.

Don't get me wrong: Java has a lot of advantages for many types of business applications, where the business logic complexity can be abstracted in well organized and standardized ways. But it's not typically the go-to language when seeking maximum performance in heavy compute or massive data volume scenarios.