Hacker News new | ask | show | jobs
by marhee 426 days ago
I don’t find your “seasoned developer” version ugly at all. It just looks more mature and relaxed. It also has the benefits that you can actually do error handling and have space to add comments. Maybe people don’t like it because of the repetition of “data =“ but in fact you could use descriptive new variable names making the code even more readable (auto documenting). I’ve always felt method chaining to look “cramped”, if that’s the right word. Like a person drawing on paper but only using the upper left corner. However, this surely is also a matter of preference or what your used to.
3 comments

I have a lot of code like this. The reason I prefer pipelines now is the mental overhead of understanding the intermediate step variables.

Something like

  lines = File.readlines("haystack.txt")
  stripped_lines = lines.map(&:strip)
  needle_lines = stripped_lines.grep(/needle/)
  transformed_lines = needle_lines.map { |line| line.gsub('foo', 'bar') }
  line_counts = transformed_lines.map { |file_path| File.readlines(file_path).count }
is a hell to read and understand later imo. You have to read a lot of intermediate variables that do not matter in anything else in the code after you set it up, but you do not know in advance necessarily which matter and which don't unless you read and understand all of it. Also, it pollutes your workspace with too much stuff, so while this makes it easier to debug, it makes it also harder to read some time after. Moreover becomes even more crumpy if you need to repeat code. You probably need to define a function block then, which moves the crumpiness there.

What I do now is starting defining the transformation in each step as a pure function, and chain them after once everything works, plus enclosing it into an error handler so that I depend on breakpoint debugging less.

There is certainly a trade off, but as a codebase grows larger and deals with more cases where the same code needs to be applied, the benefits of a concise yet expressive notation shows.

Code in this "named-pipeline" style is already self-documenting: using the same variable name makes it clear that we are dealing with a pipeline/chain. Using more descriptive names for the intermediate steps hides this, making each line more readable (and even then you're likely to end up with `dataStripped = data.map(&:strip)`) at the cost of making the block as a whole less readable.
> Maybe people don’t like it because of the repetition of “data =“

Eh, at first glance it looks "amateurish" due to all the repeated stuff. Chaining explicitly eliminates redundant operations - a more minimal representation of data flow - so it looks more "professional". But I also know better than to act on that impulse. ;)

That said, it really depends on the language at play. Some will compile all the repetition of `data =` away such that the variable's memory isn't re-written until after the last operation in that list; it'll hang out in a register or on the stack somewhere. Others will run the code exactly as written, bouncing data between the heap, stack, and registers - inefficiencies and all.

IMO, a comment like "We wind up debugging this a lot, please keep this syntax" would go a long way to help the next engineer. Assuming that the actual processing dwarfs the overhead present in this section, it would be even better to add discrete exception handling and post-conditions to make it more robust.