| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by NikolaeVarius 2534 days ago
	In many HA setups, you're supposed to not have to care if any single thing goes down because it should auto recover The article said that the node stalled in a way that was unforseen which may have caused standard recovery mechanisms to silently fail.

1 comments

laCour 2534 days ago

Right, but they didn't recover speedily. To have the cluster in such a state for so long sounds like poor monitoring to me because this can knowingly interfere with an election later.

link

kortilla 2534 days ago

The health check said it was ok. How would they know it needed to be recovered?

The fault was the bad health check. Not the process.

link

laCour 2534 days ago

They only just clarified that monitoring was in place and they were reporting as healthy. See the comments above.

link