Hacker News new | ask | show | jobs
by kragen 678 days ago
after the parent kills the children that were sending the commands to the slaves, it needs to wait to reap the zombies, because there's no garbage collection for the dead children. otherwise the process table might fill up from all the forking, like with a fork bomb. then stonith failover will nuke the server, unless it's on the blacklist of dummy servers

i wrote the stonith daemon in racket scheme because guile was too slow and was sometimes missing a heartbeat

1 comments

Yes. Once I used Rust's Command in a wrong way, and after terminating a child it has turned into a zombie. I had to forcefully kill it manually each time this happened. It had to be fixed in the long run, because each run risked an OOM Killer (employed by the system) will show up and start its random kills. When I've finished my fixes, I've committed everything to the master branch and CI slaves started working nightly to bring me fresh binaries to execute in the morning. It wasn't much longer after smoke tests when it was apparent that the bug was eliminated. ;)
oh yes, i execute so many children every day, i take it so much for granted that i didn't even think about that

i'm sure glad this discussion is so hierarchical because on irc we might get kicked for it