Hacker News new | ask | show | jobs
by Immortal333 1991 days ago
Chaining has its own benefits. But I think this doesn't fit the definition of "Pythonic". Again, "Pythonic" is highly debatable. But, You can always break down big chain of operations, into smaller chain using good variable naming in-between.

Many operations are implemented as iterator in python on list, like filter, groupby. Looking at your code, its looks like you're not doing lazy computation. (Correct me if I wrong). This could be huge performance impact, depending upon use case of list.

3 comments

I understand the unpythonic nature of Arrays may startle some hardcore pythonistas, but ability to chain functions was one of the main reasons why I wrote the package as I find nested function calls ugly and sometimes rather hard to decipher.

Regarding the perfomance, Arrays aren't meant to be super high performing but rather a simple way to manipulate sequences. For the best performance you should go with generic python, toolz or other.

I am with you on this. Personally, I would rather continue using Toolz (https://github.com/pytoolz/toolz), and contribute additional helper/utility methods to that library.

The whole point of some things being functions versus methods is that they are generic rather than specialized. The generic iterator protocol is probably the best feature about the Python language, and it's both a damn shame and bad design to not use it.

If you really wanted to make an improvement over built in lists, the thing to do would be to implement some kind of fully lazy "query planning" engine, like what Apache Spark has. Every method call registers a new method to be applied with the query planner, but does not execute it. Execution only occurs when you explicitly request it. That way you can effectively compile in efficient but readable code that takes multiple passes over the data into efficient operations internally that only make one pass, or at least fewer passes. This also naturally lends itself to parallelization/concurrency.

Dask does the lazy evaluation and query planning thing on numpy arrays and pandas dataframes, and can execute in parallel. It mimics most of their native interfaces which makes it a pretty easy drop-in.

https://docs.dask.org/en/latest/

> But, You can always break down big chain of operations, into smaller chain using good variable naming in-between.

I don't think so. Very frequently the intermediate values represent nothing in particular and naming them simply results in visual noise.

I think this is comparable to SQL or LINQ statements. Consider what those would look like if you had to name every intermediate values instead of being able to filter and group on-the-fly.

Of course you can make a mess out of those too, by building huge unreadable expressions, but that's also an extreme, similar to naming every intermediate step.