|
|
|
|
|
by bd_at_rivenhill
4062 days ago
|
|
All kinds of badness here. Bug #2 really reduces the level of comfort I would have with using ZooKeeper as a tool. First of all, the default Java policy of terminating the thread, instead of the process, when a runtime exception is not handled is fully boneheaded and the first thing you should always do in a server program is to set a default uncaught exception handler which kills the program. Much better to flame out spectacularly than to limp along with your fingers crossed hoping for the best, as this bug amply demonstrates. On the heels of that, there's this: "Unfortunately, that means the heartbeat mechanisms would continue to run as well, deceiving the followers into thinking that the leader is healthy." Major rookie mistake here; the heartbeat should be generated by the same code (e.g. polling loop) which does the actual work, or should be conditioned on the progress of such work. There's no indication that ZooKeeper is bad enough to have a separate thread whose only responsibility is to periodically generate the heartbeat (a shockingly common implementation choice), but it is clearly not monitoring the health of the program effectively. Suffering a kernel level bug is outside the control of a program, but this demonstrates a lack of diligence or experience in applying the appropriate safety mechanisms to construct a properly functioning component of a distributed system. |
|
I spent a few minutes digging around in the source and found this:
http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/o...
Around line 1100 is the entry point of the thread that sends requests, and it's also what generates heartbeats, so it looks like if the receiver thread dies it'll just keep going...