Hacker News new | ask | show | jobs
by taftster 1360 days ago
I'm not trying to flame bait here, but this whole article refutes the "Java is Dead" sentiment that seems to float around regularly among developers.

This is a very complicated and sophisticated architecture that leverages the JVM to the hilt. The "big data" architecture that Java and the JVM ecosystem present is really something to be admired, and it can definitely move big data.

I know that competition to this architecture must exist in other frameworks or platforms. But what exactly would replace the HDFS, Spark, Yarn configuration described by the article? Are there equivalents of this stack in other non-JVM deployments, or to other big data projects, like Storm, Hive, Flink, Cassandra?

And granted, Hadoop is somewhat "old" at this point. But I think it (and Google's original map-reduce paper) significantly moved the needle in terms of architecture. Hadoop's Map-Reduce might be dated, but HDFS is still being used very successfully in big data centers. Has the cloud and/or Kubernetes completely replaced the described style of architecture at this point?

Honest questions above, interested in other thoughts.

4 comments

I didn't read the article that way. FTA, the sense is Java is not dead in the same sense COBOL is not dead, that is "legacy" technology that you have now work around because it is too costly to operate and maintain. Ironically, from this article the two main technical solves for the issues with their whole JVM setup are CLP (which is the main article) and moving to Clickhouse for non-Spark logs both of which are written in C++.

With Cloud operating costs dominating the expenses at companies one can see more migration away from JVM setups to simpler (Golang) and close to metal architectures (Rust, C++).

"Java is the new COBOL" has always been either a glaring sign of idiocy/ignorance or a bad joke signifying... idiocy/ignorance.

COBOL is exotic syntax and runs on fringe/exotic hardware (mainframes, minicomps, IBM iron).

Java is a c-like syntax that runs everywhere people are shoehorning in Go and Node.JS. Syntax arguments are bikeshedding, but it was a "step forward" for non-systems coding from C and has fundamental design, architectural, breadth of library, interop, modernization, and familiarity advantages over COBOL.

Go is a syntax power stepback, with possibly some GC advantages, and Javascript even with Typescript is still a messed up ecosystem with worse GC and performance issues.

One thing that was interesting was watching the Ruby on Rails stack explode in complexity to encompass an acronym soup nearly as bad a Java as the years moved forward and it matured. Java isn't as complex an ecosystem as it is due to any failings or language failures. It simply has to be as a mature ecosystem.

Syntax complaints I'll listen too, after all I do all my JVM programming in Groovy. But if you complain about java syntax, why would you think Go is "better"?

I think a meta-language will emerge that will have Rust power and checking underlying it but a lot simpler, kind of like elixir and erlang, or typescript and javascript, or, well, Groovy and Java.

Just to probe. COBOL doesn't have many (if any) updates to it, though. And there are no big data architectures being built around it. Equating "Java is Dead" to the same meaning as "COBOL is Dead" doesn't seem like a legitimate comparison.

But I do get your points and don't necessarily disagree with them. I just don't see this as "legacy" technology, but maybe more like "mature"?

Yes, "mature" would have been more accurate for Java, some exaggeration on my end. I was trying to convey the sense of excitement for new projects and developers in Java but it is not fair to Java to be compared to COBOL. Primarily because Java is actively developed, lot more developers etc. Nevertheless Cloud is so big nowadays that people are looking for alternatives to the JVM world. 10 years ago it would been a close to default option.
Java supports AOT via Graal so you can have non JVM setups already.
Of note, Java !== JVM. Spark and Flink, for instance, are written in Scala which is alive and well :).

My best effort in finding replacements of those tools that don't leverage the JVM:

HDFS: Any cloud object store like S3/AzBlob, really. In some workloads data locality provided by HDFS may be important. Alluxio can help here (but I cheat, it's a JVM product)

Spark: Different approach but you could use Dask, Ray, or dbt plus any SQL Analytical DB like Clickhouse. If you're in the cloud, and are not processing 10s TB at a time, spinning an ephemeral HUGE VM and using something in-memory like DuckDB, Polars or DataFrame.jl is much faster.

Yarn: Kubernetes Jobs. Period. At this point I don't see any advantage of Yarn, including running Spark workloads.

Hive: Maybe Clickhouse for some SQL-like experience. Faster but likely not at the same scale.

Storm/Flink/Cassandra: no clue.

My preferred "modern" FOSS stack (for many reasons) is Python based, with the occasional Julia/Rust thrown in. For a medium scale (ie. few TB daily ingestion), I would go with:

Kubernetes + Airflow + ad-hoc Python jobs + Polars + Huge ephemeral VMs.

There's ScyllaDB as a replacement for Cassandra. https://www.scylladb.com/
Not answering your primary question, I know. But I wonder where you are getting the "Java is Dead" sentiment - I am not getting it at all in my (web/enterprisey) circle, if anything there is a lot of excitement due to new LTS versions and other JVM languages like Kotlin. And I am also finding a lot of gratitude for the language not changing in drastic ways (can you imagine a Python 2->3 like transition?) despite the siren call of fancy new PL features.
Maybe it's just a little cliche and maybe the phrase "XXX is Dying" is too easily thrown around for click-bait and hyperbole. It can probably be applied to any language that isn't garnering recent fandom. You could probably just as easily say, "Is C# dead?" or "Is Ruby on Rails dead?" or "Is Python dead?" or "Is Rust dead?" (kidding on those last ones).

And yes, I'm with you. I'm super excited about the changes to the Java language, and the JVM continues to be superior for many workloads. Hotspot is arguably one of the best virtual machines that exists today.

But there are plenty of "Java is dead" blog posts and comments here on HN to substantiate my original viewpoint. Maybe because I make a living with Java, I have a bias towards those articles but filter out others, so I don't have a clean picture of this sentiment and it's more in my head.

Java has never been more alive really. All the other JVM languages just strengthen and retrench JVM, it all deploys the same. And Java itself has really come a long long ways since the JDK7 days.
The next step is to build it in lower level language with modern hardware in mind to be 2x+ faster than the java alternatives. See scylladb, redpanda, quickwit, yugabytedb, etc.