| HN Mirror

You have a program that for whatever reason (the Python runtime in this case) only works single-threaded, although its workload could be easily parallelized (say, it’s a web server where requests are processed independently). An old established way to accomplish this is to start a “master” process which forks N “worker” processes, each of which can happily run single-threaded.

This would be a nonstarter if it required N+1 times the memory of the single process, so the OS uses an optimization called copy-on-write. When a process forks, all its physical memory is shared by the new process so it takes almost no new memory to start. If the new process writes to a memory page, that physical page is copied so it has its own version. (Thus “copy on write”.)

For most programs this works fine, but if you have a runtime that does garbage collection using a technique that requires writing to an object even if the code doesn’t change any of its values, trouble ensues. With reference counting, you have to write a new reference count for an object anytime a pointer to the object is assigned. If you store the reference count in the object, that means its physical page has to be copied. So now the CoW optimization totally doesn’t work, because just referencing an object causes it to take up additional new memory.

Ruby used to have this same problem, and after Ruby webservers became popular (hello Rails) they eventually incorporated a patch to move the GC information somewhere outside the actual object heap. Other systems like the JVM use similar techniques to store the bookkeeping bits somewhere other than the object field bits.

So what the OP did is patch the runtime so the objects created in the master process (pre-forking) have special reference counts that are never altered. This mostly works, because the master process generally does a bunch of setup so its objects were mostly not going to be garbage anyway.