Hacker News new | ask | show | jobs
by Goofy_Coyote 1054 days ago
Can someone explain this part to me, please? I don't follow what's going on.

> Python's use of reference counting defeated copy-on-write because even memory blocks holding variables that were read-only were actually written to in order to manipulate the reference counts, thereby blowing up the combined physical memory footprint of the workers. We solved this by smurfing the interpreter to use a magic reference count number for all variables that were created by the master process and inherited by the workers, and then not touching reference counts that had the magic value.

Thanks

2 comments

You have a program that for whatever reason (the Python runtime in this case) only works single-threaded, although its workload could be easily parallelized (say, it’s a web server where requests are processed independently). An old established way to accomplish this is to start a “master” process which forks N “worker” processes, each of which can happily run single-threaded.

This would be a nonstarter if it required N+1 times the memory of the single process, so the OS uses an optimization called copy-on-write. When a process forks, all its physical memory is shared by the new process so it takes almost no new memory to start. If the new process writes to a memory page, that physical page is copied so it has its own version. (Thus “copy on write”.)

For most programs this works fine, but if you have a runtime that does garbage collection using a technique that requires writing to an object even if the code doesn’t change any of its values, trouble ensues. With reference counting, you have to write a new reference count for an object anytime a pointer to the object is assigned. If you store the reference count in the object, that means its physical page has to be copied. So now the CoW optimization totally doesn’t work, because just referencing an object causes it to take up additional new memory.

Ruby used to have this same problem, and after Ruby webservers became popular (hello Rails) they eventually incorporated a patch to move the GC information somewhere outside the actual object heap. Other systems like the JVM use similar techniques to store the bookkeeping bits somewhere other than the object field bits.

So what the OP did is patch the runtime so the objects created in the master process (pre-forking) have special reference counts that are never altered. This mostly works, because the master process generally does a bunch of setup so its objects were mostly not going to be garbage anyway.

Thank you, this is a great explanation - much appreciated
I don't understand the "smurfing" solution he references, but CPython's runtime uses reference counts in each referenced value to detect garbage (when a value can be freed), which means even read-only values can be modified in memory by the runtime as object references come and go.

Those modifications force pages which were created on forking a child process as copy-on-write (meaning they share the same physical page until the page is modified by the child) to be copied and thus blow out any memory savings that would normally happen with copy-on-write.