|
|
|
|
|
by Goofy_Coyote
1054 days ago
|
|
Can someone explain this part to me, please?
I don't follow what's going on. > Python's use of reference counting defeated copy-on-write because even memory blocks holding variables that were read-only were actually written to in order to manipulate the reference counts, thereby blowing up the combined physical memory footprint of the workers. We solved this by smurfing the interpreter to use a magic reference count number for all variables that were created by the master process and inherited by the workers, and then not touching reference counts that had the magic value. Thanks |
|
This would be a nonstarter if it required N+1 times the memory of the single process, so the OS uses an optimization called copy-on-write. When a process forks, all its physical memory is shared by the new process so it takes almost no new memory to start. If the new process writes to a memory page, that physical page is copied so it has its own version. (Thus “copy on write”.)
For most programs this works fine, but if you have a runtime that does garbage collection using a technique that requires writing to an object even if the code doesn’t change any of its values, trouble ensues. With reference counting, you have to write a new reference count for an object anytime a pointer to the object is assigned. If you store the reference count in the object, that means its physical page has to be copied. So now the CoW optimization totally doesn’t work, because just referencing an object causes it to take up additional new memory.
Ruby used to have this same problem, and after Ruby webservers became popular (hello Rails) they eventually incorporated a patch to move the GC information somewhere outside the actual object heap. Other systems like the JVM use similar techniques to store the bookkeeping bits somewhere other than the object field bits.
So what the OP did is patch the runtime so the objects created in the master process (pre-forking) have special reference counts that are never altered. This mostly works, because the master process generally does a bunch of setup so its objects were mostly not going to be garbage anyway.