Hacker News new | ask | show | jobs
by fovc 615 days ago
It’s Python. Does L1 matter at all? I assume anything you’re accessing is behind a few pointers and __dict__ accesses anyway.

For me it’s mostly about .attribute being more in line with the rest of the language. Kwargs aside, I find overuse of dicts to clunky in Python

3 comments

Nope, __slots__ exist explicitly as an alternative to __dict__:

https://wiki.python.org/moin/UsingSlots

Whether or not the performance matters...well that's somewhat subjective since Python has a fairly high performance floor which makes performance concerns a bit of a, "Why are you doing it in Python?" question rather than a, "How do I do this faster in Python?" most of the time. That said it _is_ more memory efficient and faster on attribute lookup.

https://medium.com/@stephenjayakar/a-quick-dive-into-pythons...

Anecdotally, I have used Slotted Objects to buy performance headroom before to delay/postpone a component rewrite.

Yes I know the slotted attribute is not in a __dict__, which definitely helps memory usage. But my point is that if the parent structure is itself in a dict, that access will swamp the L1 cache miss in terms of latency. Even the interpretation overhead (and likely cache thrashing) will eliminate L1 cache speedups.

And yes __slots__ improve perf, but it’s about avoiding the __dict__ access, which hits really generic hashing code and then memory probing more than it is about L1 cache

Where __slots__ are most useful (and IIRC what they were designed for) is when you have a lot of tiny objects and memory usage can shrink significantly as a result. That could be the difference between having to spill to disk or keeping the workload in memory. E.g., Openpyxl does with a spreadsheet model, where there could be tons of cell references floating around

Let me try again, from the first link I shared:

> The __slots__ declaration allows us to explicitly declare data members, causes Python to reserve space for them in memory, and prevents the creation of __dict__ and __weakref__ attributes. It also prevents the creation of any variables that aren't declared in __slots__.

Emphasis:

> prevents the creation of __dict__ and __weakref__ attributes. It also prevents the creation of any variables that aren't declared in __slots__.

In short, if you create a slotted object with __slots__ it sends you down a fairly orthogonal object lifecycle path which does not create or use __dict__ in anyway. This obviously has drawbacks/limitations like not being able to add new members to the object like a normal Python object.

From the second article:

> However, if you have __slots__, the descriptor is cached (which contains an offset to directly access the PyObjectwithout doing dictionary lookup). In PyMember_GetOne, it uses the descriptor offset to jump directly where the pointer to the object is stored in memory. This will improve cache coherency slightly, as the pointers to objects are stored in 8 byte chunks right next to each other (I’m using a 64-bit version of Python 3.7.1). However, it’s still a PyObject pointer, which means that it could be stored anywhere in memory. Files: ceval.c, object.c, descrobject.c

Which I think addresses your concern about parent dict access...but I could also be misunderstanding your point.

> It’s Python. Does L1 matter at all?

In many ways it matters more because it’s Python.

I’ve met a lot of teams throughout my career who struggle daily with a badly performing Python codebase. You can write a no-frills web service in c#, go, rust or JavaScript. And, so long as you don’t do anything stupid, it’s usually plenty fast enough from day 1 to handle your users. But in my experience, the same isn’t true of Python. I’m sure Python web services can be made to run ok, but because it’s slow by default, I bet a lot more time is spent optimising Python programs around the world than optimising JavaScript.

Good point. It’s more about choosing a good algorithm though.

A brute force O(N) in C++ may be fast enough, in a situation where you need to use O(logN) to get the equivalent speed in Python. Squeezing out a few extra percent from a O(N) in Python by using slots will not be enough.

Of course that doesn’t mean you shouldn’t leave performance on the table if the optimizations have noticeable effects.

Right. Another way I’ve heard it put is that a Python program running on a modern computer is equivalent to the same go program running on a computer from 20 years ago.
> It’s Python. Does L1 matter at all?

depends on the type in question. If you are fetching and operating on a large number of records then it can matter. But otherwise the answer is more often that it does not really matter.