Hacker News new | ask | show | jobs
by skitter 1060 days ago
I'm annoyed at the reason that any/all have to be on this list. If they (and map, filter, …) were methods, you could just write `foo.` and your IDE could show you what methods are available. Postfix would make things easier to read too:

    bar.baz()\
       .filter(some_filter)\
       .map(some_op)\
       .min()\
       .foo()
Data/control flows from top to bottom. One operation per line. But with freestanding functions:

    min(map(some_op, filter(some_filter, bar.baz()))).foo()
To follow the flow of data/control, you start in the middle, go right, then skip left to filter, read rightwards to see which filter, skip left to map, read rightwards to see what map, go left to min, then skip all the way to the right. Just splitting it into multiple lines doesn't help, you need to introduce intermediate variables (and make sure they don't clobber any existing ones) and repeat yourself whether they clarify things or not. The same issue exists for list/dict/set comprehensions.
17 comments

Here you go:

    class WrappedList:
        _fns = [map, filter, min, max, all, any, len, list]

        def __init__(self, it):
            self.it = it

        def __getattr__(self, name):
            for fn in self._fns:
                if name == fn.__name__:
                    def m(*args, **kwargs):
                        result = fn(*args, self.it, **kwargs)
                        if hasattr(result, '__iter__'):
                            return self.__class__(result)
                        else:
                            return result
                    return m

        def unwrap(self):
            return self.it

This allows you to do stuff like

    WrappedList([1, 2, 3, 4]).filter(lambda x: x % 2 == 0).map(lambda x: x * 3).list().unwrap() # [6, 12]
    WrappedList([1, 2, 3, 4]).map(lambda x: x >= 5).any() # False
Deciding whether or not this is something you should do, rather than just something you can do, is left as an exercise for the reader.
Python debugging is most sane when the code just tries to keep it simple. After thousands of pdb sessions I can say most people should not be allowed to do this kind of thing in real code!
With a small reminder that the pythonic way to do filter and map is even more readable - but it's limited in scope:

   [x * 3 for x in [1, 2, 3, 4] if x % 2 == 0]

with that said, I still love the general concept of chaining, and I use that style a lot where it is already convenient and popular - in pandas code.
And herein we see a weakness of Python: There is no way to get rid of the lambda lambda lambda, without actually naming things using def. Even though we are defining a pipeline of steps, still have to put up with syntactic clutter. Compare with threading/pipeline in other languages.
You could improve the speed a little bit (maybe) by doing something like:

    self._fns = {fn.__name__: fn for fn in [...]}
This may not be faster since the list is so short, but worth checking into
This reads so much better
If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.
One puzzling thing is that it uses backslash continuation in its examples. The most favoured style, IMO is to use ()'s for line continuation, maybe the author just doesn't know about those.
Both your suggestions are awful.

The reason they are bad is that intermediate results are never named (and thus are never explained). In simple situations, it's possible to infer from the context what the author's intention was, but in more complicated cases, if you want to understand someone's code, especially if it's written in the way you did, you'd have to "disassemble" it into simpler operations, name the variables (after investigating or guessing the purpose of each operation) and then try to come up with the full picture of what's going on.

Also, as a style suggestion: avoid using backslashes. In your situation, you could just put dots at the end of the line and it will be enough to not need the backslashes. It adds noise to your code, i.e. characters that add no meaning, just sort of a "scaffolding" to hold your code together.

In my own python toolbox (specifically for the list-of-dictionaries use-case) I inject .log(). calls into the pipeline as need to show what the actual intermediate values would be.

Naming intermediates is fine (and encouraged) if there are actually meaningful names to be given. But sometimes the expression itself is the shortest meaningful name for the expression.

Re backslashes, you can also just wrap the expression in parentheses.

A perhaps more appropriate name for your 'log' would be 'peek'. Log reminds of logging, which usually does not return a value, but writes to stdout or a file or similar.
but that's exactly what log does, it is a pipeline filter that logs the content to standard output but otherwise it is passthrough.
You wouldn’t really write it as you have in the second example though. The Pythonic way of writing something like this is to use list comprehensions or generator expressions, for example:

    min(some_op(item) for item in bar.baz() if some_filter(item)).foo()
Or decomposed a little for clarity:

    processed_items = (some_op(item) for item in bar.baz() if some_filter(item))
    min(processed_items).foo()
This is pretty readable – a natural language description of the first line is “do some_op for each item in bar.baz that matches some_filter”, which corresponds 1:1 with the code.
This feels somehow even worse. I don’t think it’s more readable than Op’s example, but now it’s also weird to write.

The chaining example in OP’s first example is way better

both examples seem pretty contrived and I think this comes down just what language you are used to. Their other code seems very JS-y.

I work in both python & js. The python reads like natural language:

    Processed items is a set of some transformation of each item in bar.baz() where something is true for that item.
    then Foo the smallest in that list.
It reads like english.

JS-y stuff doesn't read like natural language, but I do think its more concise and fits the IDE function discovery workflow better.

Both models can be made into horrid messes or elegant solutions. Both are highly readable.

Now I like the python one because I find it natural to attach contextual "whys" or "because" comments to them.

    Processed items is a set of some transformation of each item in bar.baz() where something is true for that item.
    then Foo the smallest item.
    # because foo is a slow function and we don't want to foo every bar and baz
And if you have nested list comprehensions this totally stops making sense somehow.
any, all, map, filter, min, max, for loops, zip, list, tuple, reduce, list comprehensions, cycle, repeat, islice, and so on in python work on iterables, and iterable is a protocol, not a class. it would certainly be interesting to program in a language where conforming to a protocol (perhaps one that nobody had thought up yet when you wrote your class) would give your class new methods, or where all iterables had to derive from a common base class, but it would be a very different language from python

incidentally in your example, though data does flow from top to bottom, control does not, assuming the filter and map methods are lazy as they are in python; it ping-pongs back and forth up and down the sequence in a somewhat irregular manner, sometimes reaching as far as .min() before going back up, and other times turning around at .filter(...)

i wonder if you could implement the ide functionality you want with a 'wrap' menu of popular functions that are applicable to the thing to the left of your cursor, so when you had

    filter(some_filter, bar.baz())|
(with | representing your cursor) you could select `map` or `min` or whatever from the wrap dropdown and get

    min(filter(some_filter, bar.baz()))|
for any given cursor position in python there are potentially multiple expressions ending there, in cases like

    "y: %s" % y|
but maybe that's not such a hard problem to solve
> it would certainly be interesting to program in a language ... where all iterables had to derive from a common base class, but it would be a very different language from python

You mean Ruby? :P

(All Ruby iteratables mixin Enumerable, which is baaaaaaaasically inheritance.)

Or Rust! Everything that implements the Iterator trait gets access to all of Iterator’s goodies, like map, filter, reduce, etc. Implementing iterator just requires adding a single next(&mut self) -> Option<Item> method on your type.

Lifetimes and async are a massive pain in rust. But the trait system is a work of art.

I like Rust's struct + traits approach, because they avoid inheritance and encourage composition. I am sure people have built bad workarounds though to do inheritance anyway.
ruby is closer to what i meant because you can't add methods to rust's iterator, can you? but people add stuff to enumerable all the time
You can!

    trait MyIterHelpers: Iterator {
        fn dance(&self) {
            println!("wheee");
        }
    }
    
    // And tell rust that all Iterators are also MyIterHelpers.
    impl<I: Iterator> MyIterHelpers for I {}
The one caveat is that using it in a different context will need a use crate::MyIterHelpers; line, so the namespace isn't polluted.
neat, i didn't know that was possible
Or its inspiration, Smalltalk.
> i wonder if you could implement the ide functionality you want with a 'wrap' menu of popular functions that are applicable to the thing to the left of your cursor

This is already implemented in IntelliJ for Java - they call it "Postfix Completion". For example you can type ".cast" after an expression to wrap what's before the cursor in a cast expression, so type "a + b.cast", then pick cast to "float", and pick how large a preceding expression you want to cast, and you can end up with "(float)(a + b)" and go from there. They have postfix completion that can extract expressions into variables, create if-statements and switch-statements from expressions, and so many more things that I wish I had when doing non-trivial Python coding in my IDE of choice (which is not by Jetbrains)...

> it would certainly be interesting to program in a language where conforming to a protocol (perhaps one that nobody had thought up yet when you wrote your class)

Not automatic, but you could use a decorator + the protocol as type annotation, I think

Inspired by others here, I tried hacking something together myself

  from functools import partial
  
  class Pipeable:
      def __init__(self, fn):
          self.fn = fn
  
      def __ror__(self, lhs):
          return self.fn(lhs)
  
  def pipeable(fn):
      return lambda *args: Pipeable(partial(fn, *args))
  
  filter = pipeable(filter)
  map = pipeable(map)
  list = pipeable(list)
  sum = pipeable(sum)
  min = pipeable(min)
  max = pipeable(max)
  any = pipeable(any)
  
  # Usage:
  
  range(1, 100) | filter(lambda x: x < 50) | max()
  # 49
  
  [1, 2, 3, 4] | filter(lambda x: x % 2 == 0) | map(lambda x: x * 3) | list()
  # [6, 12]
  
  [1, 2, 3, 4] | map(lambda x: x >= 5) | any()
  # False
In my mind this is a holdover from when Python was much more procedural/C-like and as a Python developer it's one of my pet peeves. (I can't count how many times I've started writing the name of a list, had to backtrack to stick a `len` in front, and then tap tap tap arrow keys to get back to the front.)

I suppose we really ought to blame Euler for introducing the f(x) notation 300 years ago... Very practical when the function is the entity you want to focus on, often less useful in (procedural) programming, where we typically start with the data and think in terms of a series of steps.

Some languages like D and Nim have "UFCS", uniform function call syntax, where all functions can be called as methods on any variable. Basically, it decouples the implicit association between method dispatch and namespacing/scoping semantics. Rust also has something they call UFCS, but it only goes one way (you can desugar methods as normal functions, but you can't ... resugar? arbitrary functions as methods). Python couldn't implement this without breaking a lot of stuff due to its semantics, but it is definitely a feature I'd like to see more of.

> In my mind this is a holdover from when Python was much more procedural/C-like

That never existed. Or if it did, it was long before any trace exists, and there's trace from quite a way back when e.g. the first commit in which I can find the len() builtin (https://github.com/python/cpython/commit/c636014c430620325f8...) also has calls to file.read and list.append, and the first python-level methods are created just a few commits later (https://github.com/python/cpython/commit/336f2816cd3599b0347...). Though there may be missing commits, this is 30 in, back when Python was an internal CWI thing (although nearly a year in, according to the official timelines of the early days).

This was years before magic methods were even added (https://github.com/python/cpython/commit/04691fc1c1bb737c0db...).

So no, I don't think it's a "holdover from" anything. Rather seems like it's GvR's sensibilities.

Thanks for the thorough correction. I think I was making that assumption due to the semantics of the language, which suggests classes and methods being somewhat "bolted onto" a dict-based core. Unfortunately for me, it makes me all the more dissatisfied with the choice.
Thanks. I may have already read that post (or I just correctly backtracked the reasoning), as I was pretty much convinced namespacing conflict (the second bit of rationale) was a factor for the dunder-ing of methods, but I had no source so ultimately decided not to put it in.
> and then tap tap tap arrow keys to get back to the front.)

Learn a better editor, and this will stop being a problem.

Or just use any text editor ever and use Ctrl+arrow to jump word-wise. The most common efficiency issue in editing is editor literacy, not editor featureset.
Good programming editors are designed with the idea that as you master the program, you become more precise in telling it what to do. When editing programs, the author usually applies several navigational schemes to interpret the text of the program: by structure, by syntactical elements, but geography of the screen.

To expand on this: examples of navigating by structure include moving by token / expression / definition. Examples of moving by syntax would be the search or "jedi" navigation (i.e. navigation where you enter a special mode requiring from you to type characters that iteratively refine your search results). Finally, simply moving up / down / left right by certain number of characters is the "screen geography" way.

There's no way to tell which method is better, because they apply better in different situations, however the "screen geography" method usually ends up being the worst, because it's the most labor-intensive and requires from the author to dedicate a lot of attention to achieve precision (i.e. move exactly N spaces to the left and then exactly M spaces down is very easy to get wrong, also, with larger N and M becomes really tedious).

Navigation by word is only slightly better than navigation by character, and often falls into the "screen geography" kind of navigation. It's easy to learn, it's quite universal and doesn't require understanding of the structure of the program or mastering better techniques (eg. "jedi jump"). That's not to say that it should be excluded from the arsenal -- quite the opposite, but a master programmer (in the sense of someone who writes programs masterfully) would be the one who's less reliant on this kind of navigation.

> If your pavement has potholes, just learn to jump over them.
No. That's a wrong analogy. There's no way around having to navigate the text of the program back and forth, by character, by word, by statement, by definition and so on. This is bread and butter of people who write code.

If you complain about doing this, this is because you don't know how to perform the basic functions necessary to write code. Heuristically, this is because you are either using a bad editor or didn't learn how to use a decent one.

I.e. your complaint is more comparable to Amazon reviews coming from people who don't know how to use the product and then write something asinine, like that one about a loo brush that feels too rough when used in the capacity of toilet paper (though I believe that one was actually a joke inspired by similarly stupid but less funny reviews).

How I eventually resolve this kind of problems.

    minimum = +Inf
    for b in bar.naz():
        if not some_filter(b):
            continue
        b = some_op(b)
        minimum = min(minimum, b)
    foo(minimum)
Yes, plain old procedural python. data flow from top to bottom. it allows `print` debugging, very usefull to debug some_filter and some_op are broken.
With python I'd decompose that one-liner into several variables for readability. That probably ends up using more memory than it would otherwise but I generally don't work on systems where that matters much.

Scala was really nice for this syntax when I used it for Spark.

Map and filter don't actually consume anything until they're used later, they produce iterables. So if you pulled them into their own lines they wouldn't consume (much) extra memory. Taking the original:

  min(map(some_op, filter(some_filter, bar.baz()))).foo()
An alternative is also to use a generator comprehension that's identical to the inner part (in effect):

  min(some_op(item) for item in bar.baz() if some_filter(item)).foo()
Which could still be pulled out to a pair of lines for clarity:

  items = (some_op(item) for item in bar.baz() if some_filter(item)) # or some better name given a context
  min(items).foo()
Makes me wish python had a pipe operator like Julia's |> and R's %>%
There is a niche use-case for the reverse order `(foo min map filter baz bar)`, which is, solving typed holes (you could refine the hole as like `_.foo()` although that wouldn't be interoperable with things like next token prediction).

But that's more of a math thing than an everyday coding thing, where dot chaining usually reads nicer.

Mixed is definitely the worst, like you said.

Would that really work? You can chain those functions because they return the same type. For example, filtering a list returns a subset of the list.

Any/all return a Boolean, so the chain would stop there.

I also personally think

    any(x % 5 in range(y))
Is more clear than

    range(y).any(lambda x: x % 5)
Your point about ordering and readability really rang true for me. My way around this in Python is to separate the map and the reduce: do the map in one part with a list comprehension and the reduce in a second part on a new line.

I’ll wrap the whole thing in a named function as a way of describing what I’m doing and make it a closure if it’s used only once:

  def f(bar):
    def smallest_baz():
      bazs = (
        some_op(b)
        for b in bar.baz()
        if some_filter(b)
      )
      return min(bazs)

    return smallest_baz().foo()
it's interesting I completely agree with you and it's a big reason I find Python irritating to write (compared to Groovy, Kotlin, Ruby, etc). However there do seem to be a lot of people that dislike this method chaining style and will assert that functional style is better in every way. But I just can't fundamentally agree that writing these as functions is as readable.

Even if you go far out of your way to format it similarly, it still forces you to do a lot of mental work to see the inner most starting point and then deduce what the sequence of operations that happens is backwards, eg:

   foo(
      min(
           map(lambda x: ...,
               filter(lambda: y: ....,
                     baz(bar)
              )
          )
       )
(and of course, the python linters are typically configured to hate this so you can't realistically write it this way even if you want to)
Or, you know, you could write good'ol for loop and use multiple statements, instead of having a gigantic expression
for my smooth brain `map(this, to_that)` makes better sense than `to_this.map(that)`

same with give me `min(of_this)` instead of `of_this_want.min()`

Agree to a big extent. Rust has lots of methods, because their traits work best or most habitually with methods. So I see a comparison of Rust x.min(y) vs Python min(x, y).

The Rust x.min(y) to me is so asymmetric. min(x, y) conveys the symmetry of the operation much better, x and y are both just elements. (And the latter is how it can be used in Python. In Rust, you can call Ord::min(x, y) to get the symmetry back, but it is less favoured right now for some reason.)

This is the same mistake that golang did.
those \ are super ugly though