Hacker News new | ask | show | jobs
by pedrovhb 1992 days ago
I think this is neat but I'm not sure it's the best way to go about things.

> all(map(func3, filter(func2, map(func1, zip(a, b)))))

> a.zip(b).map(func1).filter(func2).forall(func3)

The original is indeed terrible and the second version is a bit better. A lot better than either one, though, is splitting your logic into multiple lines and assigning a descriptive identifier to each step. Maybe even throw in some inline comments if you're particularly respectful of others' time.

As tempting as it is to do something super clever and cram a ton of functionality into a small number of lines or characters (it does feel good), it's just better to be a bit more verbose and write simple, obvious code. I feel like code should be read like a book, not a puzzle.

10 comments

What I like about "cramming a ton of functionality into [a single expression]" is that it doesn't leak any intermediates to the rest of the block, and it doesn't allow for mutation. There's a single output exposed; you can't accidentally use the wrong value downstream. You could wrap it all in an inner function, I guess, but that seems like overkill unless you plan to reuse it.

Though to be fair, having explicit intermediate variables is idiomatic in Python, from what I've seen. It's one of my biggest pet-peeves about the language, but it's not without precedent.

This is exactly the main situation where I'll happily "get clever" with my code.

It's not being reused and one of the following is true... I don't want to leave behind intermediary objects for whatever reason is relevant, or I feel its worth it to compress the logic to make it possible to use a language feature that requires an expression, like lambdas or list/dict comprehensions.

> a.zip(b).map(func1).filter(func2).forall(func3)

Lets make this a somewhat concrete example.

---

heights = [1,2,3]

widths = [4,5,6]

# printing area greater than 10

# functional

heights.zip(widths).map(to_area).filter(lambda area: area > 10).forall(lambda a: print("Area " + a)

#Verbose way

hw_zipped = zip(a,b)

areas = hw_zipped.map(to_inches)

big_areas = areas.filter(a: a > 10)

for a in big_areas: print("Area " + a)

---

Which do you prefer? I would argue the right level of abstraction is the functional way in this example, and its often the case in my experience, especially in python where you don't often use a namespace to store these intermediary variables and you have can't rely on typing

As another point of comparison, as of python 3.8 you can do this in one list comp without nesting or double-computing areas with the walrus:

    result = [area for x,y in zip(heights,widths) if (area := to_area(x,y)) > 10]
I don't think that's very easy to read; I'd opt for two list comps like

    areas = [to_area(x,y) for x,y in zip(heights,widths)]
    result = [area for area in areas if area > 10]
But I agree with OP that map+filter is easier to read.
I agree. My main problem is I don't want intermediary variables floating around. Especially something like "areas". If python localized variables to a blocked namespace, I wouldn't mind

In scala:

---

val widths = Seq(1,2,3)

val heights = Seq(4,5,6)

widths.zip(heights).foreach { case (w, h) => {

  val area = w * h

  if (area > 10) {

    println(s"Area: ${area}")

  }
}}

println(area) // error: not found: value area

You can do this without the walrus in a one liner as well, I believe:

    [area for area in (to_area(x, y) for x, y in zip(h, w)) if area > 10]

or generally, you can take a multiline statement like the one you have and replace named value with its expression. Add some indentation and it's not too bad:

    [area for area in 
     (to_area(x, y) for x, y in zip(h, w))
     if area > 10]

  for x, y in zip(a,b):
      area = to_area(x, y)
      if area > 10:
          print(f"Area {area}")
>in python where you don't often use a namespace to store these intermediary variables

Hm? Most python code is within a function, in my experience.

You can abstract it out to a function but I think its overkill, even if you generalize to something like print_area_filter(heights, widths, value, cmp) or whatever

If its not in a function, your example may (or may not depending on length if either a or b have a length of zero) create a floating variable called area out there.

I agree, and yes, the line may be a bit excessive. The idea of Arrays is not just to cram a heap of functions to a single line. The readability (at least to me) is improved even with e.g. a single map

  arr.map(func)
vs.

  list(map(func, arr))
> assigning a descriptive identifier to each step

Working with data scientists, in practice, these identifiers are usually "arr1", "arr2", &c. I'd rather have method chaining. Often the intermediates are not meaningful.

I agree with you in general, people (especially data scientists) are bad at naming things.

It's probably the core skill of good programmers though, so it should be taught more. I don't think anyone sets out to use misleading names, but it's easy for name and code to diverge, and it's crippling to readability.

However, often when refactoring/updating such data scientist code (or even understanding), I need to break apart the long method chains, and this is much, much more annoying than dealing with crummy names.

At least I can print the values associated with the names, which is not easily possible in the really long method chain.

Code is read more often than it’s written; optimize for reading.
> As tempting as it is to do something super clever and cram a ton of functionality into a small number of lines or characters (it does feel good), it's just better to be a bit more verbose and write simple, obvious code.

I find fluent style often clearer as well as more terse than with superfluous intermediate variables. Verbosity isn't the same thing as clarity.

(But in Python, comprehensions/genexps are often clearer than either.)

Are these really the same?

The idiomatic Python 3 version uses generators to compose the computation and to avoid unnecessary memory allocations. Does funct.Array also do this?

- https://docs.python.org/3/library/functions.html#map - https://docs.python.org/3/library/functions.html#filter

You can split the a.b.c.d onto different lines and comment each, which is a decent middle ground sometimes (a\n.b\n.c\n.d). A problem, still, is exceptions and debugging. You get paged and see that something went wrong in that expression that does so many different things, and it’s much more frustrating to track down the bug. It makes step debugging trickier too. I’d love better error message/debugger support for that kind of programming.
I disagree with this. Splitting this simple pipeline into more variables makes stuff a lot less readable. Splitting it into variables would very clearly indicate to me the intermediate computations are used elsewhere. Which wouldn't be the case here.
This feels luke a strawman example. I feel like list comprehension results in a much more readable example here. I think, at least.

> all(func3(a) for h,w in zip(a,b) for a in func1(h,w) if func2(a))

Fair enough. Readability is subjective but I understand the sentiment. Constructing list comprehensions of such long chained expressions can be rather tedious and error prone, though (as your example shows).