Hacker News new | ask | show | jobs
by tommikaikkonen 3445 days ago
While it's far from a functional language, I find that it's possible to build functional style abstractions without too much headache. The iterator protocol is so well supported by the language and the standard libraries that using map, filter, itertools, generator expressions & comprehensions gets you a long way. You get the rest of the way by writing any specialized combinators you need and then imperative code to tie things together. You can build up a lazy pipeline of data transformations that works pretty well.

There are some gotchas where you need to explicitly copy (itertools.tee) iterators when you have multiple consumers, and be careful with mutable values, but it's manageable.

It would be nice to use curried functions as functools.partial gets verbose, but at that point you're far from Pythonic code.

2 comments

The state of a generator is always a problem that trips me up the most when I try to do FP-Python. In general a function shouldn't care whether the input was a list, set or generator. While iterating over it (or yielding from it), we always mutate some internal state of the iterator.

One solution would be, I think, to always pass tee-d copies of iterators and never(!) any iterator like it is (because you don't know whether next() has side effects or not). The other solution would be to not ever do anything lazy. One could also always consume all iterators fully. That way any generator re-use would be apparent immediately.

Some function that moves a generator forward just one or two items (something with the semantics of findFirst, for example), can subtly introduce bugs.

Personally, I run by a rule that I must consume a generator on the same line it is created, otherwise I use something that isn't lazy.

If you get some huge collection to scan, or wants to sync your code by sharing an iterator internal state, you may want to break that rule. But they are just too troublesome for my taste.

Make your functions consumers. Don't pass tee'd iterators in, but tee after receiving if necessary.

I don't see how find_first would cause any more problems than filter or sum.

That is a very bad idea in general and will lead to enormous memory leaks when large tee'd iterators get out of sync with one another.

See here:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

https://docs.python.org/2/library/itertools.html#itertools.t...

Eh? I discouraged tee'ing iterators, or at least I thought I did.
Are there any public code bases out there that use this? Would be cool to see what sort of pipelines like this people are building and how they get used.