Hacker News new | ask | show | jobs
by raymondh 684 days ago
This is an impressive post showing some nice investigative work that isolates a pain point and produces a performant work-around.

However, the conclusion is debatable. Not everyone has this problem. Not everyone would benefit from the same solution.

Sure, if your data can be loaded, manipulated, and summarized outside of Python land, then lazy object creation is a good way to go. But then you're giving up all of the Python tooling that likely drove you to Python in the first place.

Most of the Python ecosystem from sets and dicts to the standard library is focused on manipulating native Python objects. While the syntax supports method calls to data encapsulated elsewhere, it can be costly to constantly "box and unbox" data to move back and forth between the two worlds.

2 comments

First off, thank you for all your contributions to Python!

I completely take your point that there are many places where this approach won't fit. It was a surprise for me to trace the performance issue to allocations and GC, specifically because it is rare.

WRT boxing and unboxing, I'd imagine it depends on access patterns primarily - given I was extracting a small portion of data from the AST only once each, it was a good fit. But I can imagine that the boxing and unboxing could be a net loss for more read-heavy use cases.

You could create a custom C type that wrapped an arbitrary AST node and dynamically created values for attributes when you accessed them. The values would also be wrappers around the next AST node, and they could generate new AST nodes on writes. Python objects would be created on traversal, but each one would be smaller. It wouldn’t use Python lists to handle repeated fields It seems like a non-trivial implementation, but not fundamentally hard.

The analogy with numpy doesn’t seem quite right, as Raymond observes, because numpy depends on lots of builtin operations that operate on the underlying data representation. We don’t have any such code for the AST. You’ll still want to write Python code to traverse, inspect, and modify the AST.

Very fair points. For general purpose ASTs from Python your design should be more efficient while essentially keeping the existing interface.

When I referenced numpy, I was thinking of a query layer which could push traversal into the extension as well. Something that could have given me “.select(ast.Import).all()”, which in my head is kind of like doing a filtered sum in numpy.

Very cool to get your thoughts on this, thanks for making an account :)

>However, the conclusion is debatable. Not everyone has this problem. Not everyone would benefit from the same solution.

Everyone would benefit from developers being more performance minded and not doing uneccesarry work though! Especially Python who has long suffered with performance issues.

Love your work btw!

No. Days only 24h. If you focus on perfs, you leave something else.

Python is python because people cared about other things for many years.

>If you focus on perfs, you leave something else

That's the second benefit of focus on perfs.

We could do with less, better tested, and faster, features in most apps I know of.

You can focus on multiple things, you know. There is some low-hanging fruit in Python for performance in certain circumstances (mostly hot loops, at least, in my experiments). For example, if you need to extract a string from a datetime object, doing so manually with f-strings is about 20% faster than strftime. If you use the string mini-format language instead, it’s 40% faster.
That's literally the opposite of focusing.
What you’re describing is myopia. Focusing purely on performance at the expense of anything else would probably result in highly unreadable code, yes. Being aware of and caring about performance, and choosing to prioritize it when reasonable is not the same thing.
”You can focus on multiple things”

You can, but each added focus degrades the quality of the others.

The key principle is thinking with a mindset of cost.

Even if it’s low hanging fruit, there’s a world of difference between assuming we can work it in, and saying, “this is what it will cost, and this is what we won’t work on result”.

And similarly, saying ”that’s impossible” is not in the same universe as “the cost is extremely high and it’s not important enough to us to pay that cost”.

IME from the perspective of an SRE / DBRE, performance is nearly always given up (if it’s ever even considered) in favor of making things easier, and this tends to have large consequences later on.

People seem to have taken the quote, “premature optimization is the root of all evil” to mean “don’t focus on performance until you absolutely have to,” but when you push them to prove that they don’t have to, they often haven’t even profiled their code! Cloud compute – and especially K8s – has made it such that it’s expected that you simply over-provision and scale as necessary to get the throughput you need. I don’t personally see running through a profiler (ideally as part of CI) as being particularly difficult or onerous.

>You can, but each added focus degrades the quality of the others.

Only if the same people do both.

Or if dev that would rather focus on perf find it equaly motivating and fun to work on something else instead.