Hacker News new | ask | show | jobs
by jefftk 1064 days ago
This was famously a problem for Google's distributed lock service, Chubby. They handled it by intentionally having outages to flush out ways it might have started to bear loads it wasn't designed for: https://sre.google/sre-book/service-level-objectives/#xref_r...
1 comments

I'm a fan of the 'chaos monkey' (Netflix software) approach of this.

Can't expect your platform to be reliable, if it just breaks at random.