Hacker News new | ask | show | jobs
by rtpg 30 days ago
We've been chasing down similar aiohttp client creation issues (liked to ...aiobotocore usage) for months now.

It's annoying that somehow talking to S3 etc requires so much churn. We have been trying to cache session objects and the like but clearly are still missing something.

Chasing this down has also made me realize how little Python libs use `weakref`, and just will build up so many circular references. The other day I figured out Django request's session infrastructure creates a circular reference meaning that requests have to get GC'd to get cleaned up in CPython.

I have a suspicion that the 3.14 problems are heavily linked to "real" workloads being almost entirely filled with cyclical objects.

1 comments

It's really fascinating to read this, since I've encountered similar memory issues in other languages (ruby, go, etc.). Debugging these issues is a pain.

Is there a way to make all this much easier to debug and to prevent memory issues in the first place? Is the abstraction level not quite right?

So with CPython's reference counting, if you're good at not building strong cycles, you really can avoid garbage pressure. It's not even that complicated, it's mostly a question of making a weak reference _somewhere_ along the chain. Often the ergonomics are not great, but Python @property's are nice here.

So for example

class Request

class Session

request.session exists, and the session is "part" of the request. but session.request often exists as a facility. That's a reference cycle which prevents the request (and anything it's pointed at!) from being deallocated at the end of a request.

But in this case, you could easily do something like:

session._request = weakref.ref(request) # on session creation

and then have session.request call session._request() (and maybe assert session._request() is not None if you want to be certain). If you're confident that the session is a "child" of the request, and that you would _never_ have a hold of the session after the request is done, this is a cheap trick that makes session.request cost a little bit more but not much.

I think most Python libraries just don't do memory perf analyses here, and also "believe" in the garbage collector. When GC runs, both request and session will get deallocated, after all! But the long term effects of everyone relying on the GC are that GC is expensive when it doesn't need to be, and when looking through memory you just have more stuff to dig through