Hacker News new | ask | show | jobs
by maweki 3445 days ago
The state of a generator is always a problem that trips me up the most when I try to do FP-Python. In general a function shouldn't care whether the input was a list, set or generator. While iterating over it (or yielding from it), we always mutate some internal state of the iterator.

One solution would be, I think, to always pass tee-d copies of iterators and never(!) any iterator like it is (because you don't know whether next() has side effects or not). The other solution would be to not ever do anything lazy. One could also always consume all iterators fully. That way any generator re-use would be apparent immediately.

Some function that moves a generator forward just one or two items (something with the semantics of findFirst, for example), can subtly introduce bugs.

2 comments

Personally, I run by a rule that I must consume a generator on the same line it is created, otherwise I use something that isn't lazy.

If you get some huge collection to scan, or wants to sync your code by sharing an iterator internal state, you may want to break that rule. But they are just too troublesome for my taste.

Make your functions consumers. Don't pass tee'd iterators in, but tee after receiving if necessary.

I don't see how find_first would cause any more problems than filter or sum.

That is a very bad idea in general and will lead to enormous memory leaks when large tee'd iterators get out of sync with one another.

See here:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

https://docs.python.org/2/library/itertools.html#itertools.t...

Eh? I discouraged tee'ing iterators, or at least I thought I did.