Hacker News new | ask | show | jobs
by nerdponx 1991 days ago
I am with you on this. Personally, I would rather continue using Toolz (https://github.com/pytoolz/toolz), and contribute additional helper/utility methods to that library.

The whole point of some things being functions versus methods is that they are generic rather than specialized. The generic iterator protocol is probably the best feature about the Python language, and it's both a damn shame and bad design to not use it.

If you really wanted to make an improvement over built in lists, the thing to do would be to implement some kind of fully lazy "query planning" engine, like what Apache Spark has. Every method call registers a new method to be applied with the query planner, but does not execute it. Execution only occurs when you explicitly request it. That way you can effectively compile in efficient but readable code that takes multiple passes over the data into efficient operations internally that only make one pass, or at least fewer passes. This also naturally lends itself to parallelization/concurrency.

1 comments

Dask does the lazy evaluation and query planning thing on numpy arrays and pandas dataframes, and can execute in parallel. It mimics most of their native interfaces which makes it a pretty easy drop-in.

https://docs.dask.org/en/latest/