Hacker News new | ask | show | jobs
by manjalyc 962 days ago
A lot of comments here focus on the readability, conciseness, and expressiveness of Java streams compared to other languages. IMO they are missing the point and are just reiterating the same complaints everyone has about the Java language in x different ways.

Java streams bring real benefits compared to other mature languages. My favorite is that sequential streams can be efficiently parallelized with a single operation, .parallel(), and sequentialized back, .sequential(), on any stream without having to configure a single knob manually (although you certainly can), an equivalent I am unaware of in any other matured language. These make operations such as .collect() and its mutable reductions leverage multiple threads for effectively 0 additional programming time.

edit: A lot of people are focusing on my favorite feature of parallelizing or serializing a stream with a single command, which apparently you can also do in C#, that was just an example guys. Other cool things you can do with Java streams natively now is leverage virtual threads (take that C#), use them asynchronously with completable futures, define how elements are accessed/gathered in streams via spliterators, etc. Streams in Java are very composable (not just as in composition, but as in utility), and enable leveraging nearly every other part of the language natively. In other mature languages, streams are very rigid and feel non-composable. I'm not saying that everything is impossible in other languages, but that in Java streams feel like a first-class citizen.

11 comments

> My favorite is that sequential streams can be efficiently parallelized with a single operation, .parallel(), and sequentialized back, .sequential()

That's not actually true. .parallel and .sequential set a state flag for the entire stream. A stream that is opened, parallelized, then sequentalized, will actually just execute sequentially [1]

[1]: https://docs.oracle.com/en/java/javase/14/docs/api/java.base...

In practice, such frivolous parallelism is frowned upon and rarely useful.
+1 almost all uses of parallel streams I’ve seen in Java caused issues in reliability or performance.

Not because of any reason other than the people using it choose to use the feature before learning about the feature.

One of the most important comments in this thread.
That's a common feature nowadays, not unique to Java.

C# has had `enumerable.AsParallel()` since .NET 4.0 (2010). Rust has rayon `input.par_iter()`.

> Rust has rayon `input.par_iter()`

rayon is a 3rd party library though, not part of the language itself, compared to the Java streams discussed here.

With C# I'm not sure if .NET can be called a library or not? All C# tooling ships with .NET by default or not?

That's just the way Rust does things, like with the `rand` crate. Lean stdlib, easy dependency management. For the purposes of "does Rust do X", popular crates should be included in that consideration.
Ok, so then what is this discussion about? Java Streams have existed basically forever, as a 3rd party library, but that doesn't matter here in our conversation about Java language features, where 3rd party libraries somehow changes what the language is...
I guess it's nice for Java that they did this, but the OP's assertion that no other language handles this as nicely doesn't really stand. For example, this quote:

> In other mature languages, streams are very rigid and feel non-composable

I'd like to see some motivating examples that make me say to myself, "Yeah, Java's got a neat trick there".

Whether it ships with it or not is, however, irrelevant. As long as the library is sufficiently popular and accepted by community, it can be seen as the advantage of a particular platform/language.

In this regard, while it is nice that PLINQ and various `Parallel`-related APIs come out of box in C#, it is a marginal difference with Rust where building parallel loops is `cargo add rayon` away.

If it doesn't ship with the default distribution of the language, then there's the risk that multiple mutually incompatible libraries emerge or that the adoption by the ecosystem is spotty. Best example: the async runtime mess in the Rust ecosystem.
There is no async runtime mess. Tokio is the preferred one, and given Rust's goals it is difficult to do it better (pluggable async executors and the degree of flexibility such abstraction offers - running both on big servers and bare cooperative multi-tasking on microcontrollers).

In addition, multiple versions of transitive dependencies can coexist in Rust without conflicting with each other, there is no such risk.

Quite so. But I am under the impression that not every library supports Tokio as a runtime. And while it might be possible to run multiple runtimes in the same process, or use compatibility wrappers, it sounds like trouble.
It is definitely a mess, given incompatible semantics making it an herculean effort to write runtime agnostic async libraries.
> Whether it ships with it or not is, however, irrelevant

When talking about projects in the wild, sure. But if we're specifically talking about language features, then "something being a part of the language" is wildly different than "installable 3rd party library".

This mentality is the exact cause behind both .NET and JVM worlds being worse at enjoying the OSS-first benefits of their ecosystems than Rust or, God forbid, Go where it is expected to import a widely known good packages for solving a particular task.

Sometimes, it is a scar tissue from dealing with NIH syndrome too - at least you can use the OOB tools for combating with it, but the NIH itself it the actual source of people being resistant to adopting proven and good solutions developed by community.

On the contrary, it means I can be sure wherever there is an implementation the features I depend on are available out of the box, and don't depend on someone during late nights to add support for the given platform.

It also means that I don't need to download the whole Internet for basic features.

I'm not sure if I understand your point, but I haven't seen any NIH at the places where I have worked. We have been encouraged to use popular stable libraries when possible.

Java has several 3rd party dependencies that are expected to just be there in projects where I have worked. Examples are Lombok, Apache Commons and SLF4J. These have become so widely used, that I have stopped thinking about them as external dependencies.

Guava used to be more popular too, but now that Java has Optional and Streams, I don't see it as often.

It certainly can, if you want to be technical, it is the Base Class Librarily, short BCL.

.NET is the only language ecosystem that is comparable to Java, as they are a kind of yin/yang between themselves.

Does C# allow you to convert a sequential stream into a parallel stream, or do streams have to be parallelized when initiated? Genuinely asking, I do not know C#
That's literally what AsParallel() does.
As much as I like Java, I don't think this is a good point. Even C lets you do this[1]

[1] https://en.wikipedia.org/wiki/OpenMP

My point was that it is built into Java streams and requires 0 extra configuration or work on the programmer's end. Not that you can't parallelize in other languages.
Well yeah, this is true for openmp as well.

In my experience, parallel streams rarely deliver a substantial performance boost. There are cases where they do, but it's more the exception than the rule. The overhead from the streams API, along with the synchronization penalties means it mostly only makes sense for coarse grained I/O laden operations.

In the entire time I worked at a web shop with a large Java backend I never found the parallelization to be necessary, and in many cases was outright dangerous. We just had it disabled on high-traffic services (thread limit set at 1) to prevent foot injuries.

Don't mean to detract from the point :-). I love how neat and elegant the feature is.

> We just had it disabled on high-traffic services (thread limit set at 1) to prevent foot injuries.

Can you please elaborate?

The default thread pool will use all available cores on the machine. With multiple developers arbitrarily using parallel streams in business logic because it "sounds faster" there is more risk of disrupting concurrent requests.
OpenMP is not a language? You can take any language and add libraries to achieve a specific goal, my point is that Java streams are first-class citizens compared to other matured programming languages that also enable leveraging other aspects of the Java ecosystem without running into the rough edges you will with OpenMP.

Also, that is definitely not the rule? If your problem is parallelizable by nature, parallelism provides massive speedups. Overhead from parallelizing a stream is minimal (effectively nonexistent) now if you leverage Java virtual threads (JDK 21). Additionally, most problems are not parallelizable, yes, but most solutions consist of individual steps and there is a high likelihood that at least one of those steps will benefit from parallelism. Java streams can switch between .parallel() and .sequential() execution on the go. You almost certainly don't want to leverage stream parallelism for most I/O operations, unless you leverage Java's Completeable Futures and Managed Blocking (but any gains here are probably minimal anyways) but the point is that you can leverage them, because streams in Java are a first class citizen.

OpenMP is a compiler flag away.

> Also, that is definitely not the rule? If your problem is parallelizable by nature, parallelism provides massive speedups. Overhead from parallelizing a stream is minimal (effectively nonexistent) now if you leverage Java virtual threads (JDK 21)

This is just not true at all. Not only do virtual threads cope very poorly with I/O other than network I/O, you still get memory barriers. Virtual threads makes the memory overhead lower since you don't need full separate stacks and reduces the cost of spawning new threads, neither of which was ever an issue with streams since they use the common thread pool. Virtual threads don't significantly increase the per-thread performance.

> Not only do virtual threads cope very poorly with I/O other than network I/O, you still get memory barriers.

Which is why I said you probably don't want to use the paradigm I outlined for I/O, but if you did you would want to leverage Java's Managed Blockers and async Completeable Futures for those exact reasons.

> Virtual threads makes the memory overhead lower since you don't need full separate stacks and reduces the cost of spawning new threads, neither of which was ever an issue with streams since they use the common thread pool. Virtual threads don't significantly increase the per-thread performance.

Of course virtual threads don't significantly increase per-thread performance? They make the overhead of spawning multiple threads minimal-to-zero compared to native OS/platform threads, minimizing the cost of jumping from sequential streams to parallel streams. Also, parallel streams don't have to use the global fork join pool, you can use your own fork-join pool? Which is possible in Java, because once again, streams are treated as a first-class citizen and can leverage nearly all other parts of the language efficiently and natively (although I will say Java's verbosity/boilerplateness can suck if you want to leverage your own fork-join pool, but that's widespread complaint of Java not specific to its streams)

>OpenMP is not a language?

Why does it matter that OpenMP is only a standard, rather than a language?

>my point is that Java streams are first-class citizens

They aren't. They were bolted on later and are quite cumbersome.

>without running into the rough edges you will with OpenMP.

I'm pretty sure it is easier to write fast parallel code with OpenMP than with Java streams. In fact, I am always surprised how well OpenMP works, when I get to use it.

Which is a completely separate language.
manjalyc said it is not a language.
#pragma omp target parallel for map(to:v1,v2) map(from:v3)

is sure as hell not part of the C specification.

It's part of the OpenMP specification. C sans OpenMP will just ignore it.
C# had PLINQ (aka arr.AsParallel()) since the time immemorial. It also works quite well with picking the right parallelization strategy and is very similar in its use to Rust Rayon's .par_iter().
Scala has all of the benefits you are listing, plus more, and natively, with a more concise and composable syntax.
Yep, and I love writing Scala as much as I hate writing Java, but I wouldn't say its maturity is in the same league as Java, C/C++, etc.
A language doesn't just keep getting better and better with age. "Maturity" is about achieving a a level of stability and quality in language features, tooling, runtime, library ecosystem, etc. Scala and Java share the same runtime and library ecosystem, and Scala is arguably more mature in its language features, since Java has been forced to add features (and incur extra complexity) playing catch-up to newer JVM languages (including Scala.)

Both languages actually suffer from maturity in their tooling, because their standard build and dependency management tools (maven and sbt) are outdated and crufty, while newer languages such as Go and Rust have tooling that was built more recently, with the benefit of more recent experience.

One of the strengths of Scala is also a weakness. They aren't afraid to break things and make changes that are not backwards compatible. The Scala devs have been much more careful about breakage than in the past, so I think they have reached a good compromise between stability and evolution. The masterful execution of the Scala 3 upgrade is an example.

Java by contrast will never clean up the bad, inconsistent or obsolete cruft in the language. If it is really important for you to run ancient JARs on Java 21, then the Java approach is superior.

> My favorite is that sequential streams can be efficiently parallelized with a single operation, .parallel()

I would never use it because I can't reason what is going under the hood, and if performance will improve or dramatically decrease because all parallel machinery has significant overhead compared to vectorized single thread logic.

I actually really like the java streams, sure it could be better, but it is actually extremely useful. The parallell() though is very bad as its too easy to ruin your entire app. Its backed by the Common ForkJoinPool which many have no control over, and if Java was unable to detect CPU Count, it could be set to unbounded.
How often is it used? I write a lot of Java and I never used this feature. For me, it made streams implementation unbearably complex, to the point that I can't read its sources for a feature that I probably will never use.

I, personally, wish they never implement this parallel feature. For me JDK would be better without it.

Starting to program with Java streams is weird because you are utilizing functional constructs in a language that historically had little notion of them. When I first started using them it felt worthless when I could achieve the same thing in classic OOP faster (and often run it faster too). But after a while you get a feel for the fluid style programming streams enable (and imo cleaner code). These days with ChatGPT, its probably a lot easier to get started.

With that said, you should almost always write a stream thinking only sequentially first, then identify steps which can benefit from .parallel() and only parallelize those steps. Its leveraging .parallel() efficiently that provides an advantage at run-time and why I tend to use it.

I guess I was unclear, but I wrote specifically about parallel streams. Of course I use ordinary streams on every day basis. But using parallel stream in a server application which processes dozens of other requests simultaneously and runs on a server with dozen of other applications (very typical use-case for Java) just makes very little sense, because CPUs are already loaded and it'll just result with more context switches. I could imagine use-case for that (very urgent request which must be completed at expense of other requests and includes heavy collection processing), but I've yet to encounter it.
Yea, I don't imagine native stream parallelism will help when CPUs are already loaded. Presumably you're using Spring or Rx, in which case you probably can leverage reactive streams and/or Managed Blockers, but thats really just taking advantage of async patterns and not necessarily parallelism. The only case I could envision having a concrete benefit is if you used your own fork-join pool leveraging virtual threads instead of the global fork-join pool to prevent platform threads from hogging CPU, and then used reactive streams that leveraged the virtual threads. Although this would (theoretically) raise responsiveness, it would almost certainly come at the cost of throughput. All that is to say, parallelism generally only provides as much value as you have idle CPU cores.
Indeed, and some things become really hard to write because the operations are required to cope with parallel execution that I never want. Anyone doing serious parallelism is going to reach for another library anyway (Rx etc). Making streams parallel was a massive mistake and has just lead to enormous accidental complexity. God how I wish they’d just added map/filter/reduce to the collection interfaces.
The Stream API is insanely useful with just serial execution alone. A nested for loop with random breaks (that over time will do some random side-effect here and there, making it completely unreadable mess) is much worse than the “pipeline-y” behavior of streams.
It is useful, but it also has weaknesses. For example, I’ve lost count of the number of times I’ve seen someone forget to close() an IO-based stream (eg File.lines). But probably for 99% of cases you could get away with slurping the entire file into memory and returning a list. The streams API is optimised for the 1% of cases that need the additional complexity, at the expense of worse ergonomics for the common case.

I get why it ended up that way. I think it would have been difficult at that time to get traction for adding FP features just as a convenience. So they needed to do all the parallel stuff to justify it. But, as I said, anyone I see doing serious parallelism in Java is not using the streams API.

I use parallelization all the time. It's easily added later thanks to this feature, unlike say rewriting a code base from a single threaded language because you thought async constructs would be enough. Never making that mistake again.
Do you have an example or a link to examples that makes you say "In other mature languages, streams are very rigid and feel non-composable"? I'd be interested in seeing what advantages it has over something like Rayon.
Did I get here before AbstractFactoryBean meme? The famous class no one actually used?