Hacker News new | ask | show | jobs
by notimetorelax 3144 days ago
We observed in past that long GC is the cause of node failures. When long GC happens node doesn’t respond, master node decides that this node had left the cluster :\
1 comments

Ya, we often see a node die of natural causes, and then the garbage produced from recovering the node and relocating the data ends up bringing down the rest of the cluster via long GC pauses.