Hacker News new | ask | show | jobs
by bananapub 838 days ago
Google SREs also end up investigating single machine issues, fyi.
2 comments

Yes, but At ScaleĀ®

It's a totally different experience when you have the people who technically own the hardware side of the operations taking no responsibility for the well-being of it, and the people who own the software developing elaborate workarounds for bad machines, and the SREs maintaining blacklists of individual nodes.

In my experience it's fun to do that but only worth it when SLOs are on the line (so a significant number of bad machines).