Hacker News new | ask | show | jobs
by delecti 3647 days ago
I recently got a code review that in several places suggested I switch to the new Java 8 stream API [1]. I just flatly responded that it was far less readable, even if I could condense a half-dozen lines of code down to one. Where I can quickly scan over a foreach loop to get the jist of what it's doing, I have to closely examine each call in the new approach to have any idea what it's doing.

[1] http://www.oracle.com/technetwork/articles/java/ma14-java-se...

11 comments

Stream based programming is a paradigm that with some training and proper code indentation is much, much faster to read than a nested for loop. Once you get used to it,you can literally fast-scan code written in this style with the confidence that you are not missing anything.

Also, assuming you do not use mutable state, it also has the advantage of being easily parallelizable without any code changes. (As well, as easily combinable, throttlable, samplable etc)

But it _definitely_ has a learning curve (like Calculus) and one needs to invest time in it to get used to this paradigm. Time must be spent in learning all the various operators and how they can be effectively leveraged. Stream-based code is certainly not something you can read off the cuff.

The good news is that this is valuable time investment since all stream/observable libraries across languages have a large set of identical operators and shared API surface - the cross platform applicability is fantastic.

Once you know stream based code in language A, you can automatically read stream-based code in language B, without even knowing the syntax of language B.

I love observable stream libraries like RxJS. It makes async programming so easy! I could never write async JS code effectively with callbacks or even promises. Too much state to keep in my poor head.

We now even have standards like ReactiveX coming out that are attempting to define a cross-language/platform spec for observable flows and standardized patterns on how to create new stream operators, etc.

At risk of a terrible analogy, I consider Stream based code to be something of a mini-language like Regex. (but easier to read). No one who hasn't put in time understanding regex elements will ever follow regex patterns. Ditto for stream based stuff - you need AOT learning, but the rewards are worth it.

"Stream-based code is certainly not something you can read off the cuff."

    List<Integer> transactionsIds = 
        transactions.stream()
                    .filter(t -> t.getType() == Transaction.GROCERY)
                    .sorted(comparing(Transaction::getValue).reversed())
                    .map(Transaction::getId)
                    .collect(toList());
I honestly think most programmers fluent in Java 7 programming, can guess this is finding "grocery" type transactions, sorting by "value" transaction property in descending order, extracting the transaction ids, and returning the result as a list. ("map" could be confusing, as this usage is borrowed from functional programming.)

May take a while to learn the performance and memory usage characteristics, and how to correctly write code in this style, but I'm not buying that it is difficult to "read off the cuff."

If you're a .NET programmer, we've been doing that kind of stuff with LINQ for years; it's all over the place.

The only patterns that I still like to keep as old-style loops are constructs that ReSharper transforms into hairy-looking Aggregate() expressions

Err..yes, Java 8's filter, sort, map and collect are the most basic stream operators and you are correct that even folks un-used to this style can determine an understanding. But in code that uses observable stream-based programming you nearly always use more complicated operators like flatMap, merge, skipWhile, takeUntil, combineLatest etc. Without spending time learning these operators and how streams work, one is always going to scratch one's head. (Unless you possess an FP background)

Even in Java 8 plain streams, the Collector is a powerful paradigm for grouping operations with a large amount of variations and it takes (at-least it did for me) some involved time learning how to effectively leverage this in day to day code.

> ("map" could be confusing, as this usage is borrowed from functional programming.)

On the other hand, both Python and Perl have 'map' as a built-in function. There has been spillage out of functional programming for a while.

Isn't the point of stream API a) to be able to iterate over infinite set of values b) to fiddle with the rate of "async" object generation?

I'm not sure I understand the difference between `compose' and stream APIs.

Haven't touched Java in a decade and that is beautiful. I like Ruby's functional idioms and LINQ though.
I agree in general case, but this example

     transactions.stream()
                 .filter(t -> t.getType() == Transaction.GROCERY)
                 .sorted(comparing(Transaction::getValue).reversed())
                 .map(Transaction::getId)
                 .collect(toList());
seems to be net improvement to me. It reads like SQL, and eliminates many causes of error (wrong indexing variables, off-by-one, copy-paste error in boilerplate).

Yes it requires learning several new concepts, but in the end it pays off.

BTW I wouldn't switch old code to this style, because there never seems to be time for that. But new code written like that is perfectly OK IMHO.

EDIT: I really should've checked the code more carefully - the initial version had no indexing variables. Still, it reads better and has less boilerplate.

While some people may like to read code that looks like SQL, I've found Java 8 features like this are poorly supported by the debugger, so debugging stuff like this tends to require "horse whispering" or rewriting the logic into something that can be stepped through.
GNU Gremlin is a stream API to traverse graphs.
That reads more like Haskell than like SQL. (Doesn't have any bearing on the rest of your point one way or another.)
I think it reads better with intermediary variables. Granted sometime you go to name an intermediary variable that does two things (just like male_siblings = (2 * (X + Y)) if I have a brillant idea for the variable name.
As a functional programmer (Haskell, Erlang) I can honestly say that after years of FP I think differently about how to describe the intent of my code. It's more difficult for me to understand a foreach loop with mutable state than a foldLeft.
I think it's a familiarity problem. I felt the same at one time, but once I became more familiar with the operations, I found the stream API much easier to scan.

I was asked at work to explain why I prefer the stream library - other developers have lagged a bit on usage.

The first reason is, the method calls are designed to only do so much, and are named after what they're designed to do. Filter is for filtering. Map is for mapping. I can read the first word of each line to get an idea of what it's doing (filter sort map).

Another bigger reason for me is: you know exactly what code is generating those transaction ids. If you are interested in how the transaction ids are generated, it is damn obvious which code you need to look at. And if you aren't interested in how the transaction ids are generated, it is damn obvious which code you can safely ignore.

In comparison, what does a for loop do? It does all sorts of things, oftentimes several different things at once. As such, I leave for loops for more involved processing, and use the stream API for straightforward transformations.

As someone who could very easily be on the other side of that code review (and I'm preeeettty sure I'm not in this case?) I feel obliged to at least try to provide a counterpoint :).

So I agree that enormous blobs of unreadable crap are indeed unreadable, and that regardless of how neat and functional your code is, it can still be complete gibberish to most people. That being said, long chains of streams can be broken out quite nicely by using intermediate names:

e.g.

  Comparator<Transaction> descendingTransactionsByValue = comparing(Transaction::getValue).reversed();
  Stream<Transaction> groceries = transactions.filter(t -> t.getType() == Transaction.GROCERY);
  Stream<Transaction> sortedGroceries = groceries.sorted(descendingTransactionsByValue);
  Stream<Transaction> transactionids = sortedGroceries.map(Transaction::getId).collect(toList());
vs. the first code block under "Figure 1" in the linked article.

For me at least, it helps keep code from getting too unruly: you only have one 'thing' you can do in a filter, map or sorted call, unlike in a foreach loop where anything can go. So my thesis is this: using streams I can quickly scan over the function/stream names to get the gist of what it's doing, but using foreach loops I need to closely examine each line to have any idea of what's happening :3 (e.g. people love abusing labeled breaks [1] in our codebase, as well as excessively modifying input parameters w/o documentation, so I might be a bit biased against for loops)

[1]: https://stackoverflow.com/questions/14960419/is-using-a-labe...

I find the code in the linked article much more readable than that. You've introduced a huge amount of noise that makes it hard to see what the actual operations being performed are.
I agree to an extent ... this example is pretty contrived, but when you start getting around ten filter/map/groupby operations in, it gets a little difficult to follow what's supposed to be happening. So typically, my first step towards breaking it out into a method is separating out the individual streams like above. As is mentioned in a cousin comment, it also looks a lot nicer with type inferencing, but alas we are stuck with the verbosity of standard Java 8 for now.
This is a place where local variable type inference really comes in handy for cutting down the noise of the type declarations.

  var descendingTransactionsByValue = comparing(Transaction::getValue).reversed();
  var groceries = transactions.filter(t -> t.getType() == Transaction.GROCERY);
  var sortedGroceries = groceries.sorted(descendingTransactionsByValue);
  var transactionids = sortedGroceries.map(Transaction::getId).collect(toList());
is much easier to understand
Oh definitely, although I'll have to wait for it to be added to the JDK [1] in order to use that in anger :/ (outside of lombok)

[1]: http://openjdk.java.net/jeps/286

Look, you're basically just saying you don't want to learn anything new.

"I just flatly responded that it was far less readable, even if I could condense a half-dozen lines of code down to one."

Express the exact same thing, in 1/6 the code, and somehow the shorter code takes longer to read and understand?

You don't even attempt to express why you find the shorter code harder to understand. Or show code written both ways, to argue why one is better than the other. Or anything at all to support your claim.

Based solely on what you're telling us here, I'm siding with your code reviewer on this one.

> Express the exact same thing, in 1/6 the code, and somehow the shorter code takes longer to read and understand?

I think people need to differentiate between code review and code scanning. I look at code completely different when someone is pointing to it than when I'm scanning through a file. A ternary operator isn't tough to read when you're already looking at it. However, scanning through a file I'm going to have a lot more difficulty spotting a `?` than I am the indentation of an if/else block. I'm more than content to take 5 lines instead of 1 to spot it 2 months later when I don't know what I'm looking at.

As long as the one line is less than six times harder to read than any one of the previous lines, changing to the new approach seems like a win?
I suppose to some extent this is a matter of taste, but I've usually found the streams api to be clearer and easier to understand, although perhaps there's a bit of a learning curve. I think the win is especially large in cases where you replace an anonymous inner class with a lambda, imho. At my previous job, we had all gotten quite familiar with Apache's CollectionUtils, but java8 almost obviates the need for that entirely.
hopefully they flatly responded that you're wrong.

more generally, if you can condense 6 lines to 1, that change will almost always improve readability. ofc, you can create golfed examples where this is not the case, but those are outliers. i'd say that if you are arguing in favor of code 6x as long as an equivalent implementation, the burden of proof is on you. and being used to a more verbose syntax because you've used it for years is not a valid defense.

I cannot even fathom why you think that's less readable than someone's mangled foreach loop.
Yay, "fluent APIs" (the Builder design pattern w/ method chaining). Been there, done that.

Actually, I'm a repeat offender. Created a jooq like wrapper for SQL. Ditto for HL7. Ditto for UIs. In fact, I was Builder crazy for a while.

I got over it.