|
|
|
|
|
by raymondh
684 days ago
|
|
This is an impressive post showing some nice investigative work that isolates a pain point and produces a performant work-around. However, the conclusion is debatable. Not everyone has this problem. Not everyone would benefit from the same solution. Sure, if your data can be loaded, manipulated, and summarized outside of Python land, then lazy object creation is a good way to go. But then you're giving up all of the Python tooling that likely drove you to Python in the first place. Most of the Python ecosystem from sets and dicts to the standard library is focused on manipulating native Python objects. While the syntax supports method calls to data encapsulated elsewhere, it can be costly to constantly "box and unbox" data to move back and forth between the two worlds. |
|
I completely take your point that there are many places where this approach won't fit. It was a surprise for me to trace the performance issue to allocations and GC, specifically because it is rare.
WRT boxing and unboxing, I'd imagine it depends on access patterns primarily - given I was extracting a small portion of data from the AST only once each, it was a good fit. But I can imagine that the boxing and unboxing could be a net loss for more read-heavy use cases.