Hacker News new | ask | show | jobs
by tomp 4213 days ago
I had the same issue in Python recently. The project runs as a server that loads a huge amount of objects from the database, and could use as much as 10GB memory! Python's reference counting works great, but every so often, the full-heap-scanning cycle collector would run, and it took quite a lot of time to scan a mutli-GB heap.

We noticed the issue happened most often when deserializing objects (loading them from Redis to memory). As it turns out, Python would schedule a collection every time the object_created counter was sufficiently higher than object_destroyed counter. In general, this makes sense, because that way you can be sure that objects are being created and not being freed, which most likely means a resource leak or a reference cycle. However, the same thing happens during deserialization - many new objects are created, and none are freed. Coupled with Python's low threshold (700), GC was triggered many many times in every serialization loop (usually in vain, as no new objects became recyclable). Disabling GC and running full collections manually solved the problem