|
|
|
|
|
by tomlue
1319 days ago
|
|
I don't work at a giant company, but I'm curious: > Anyone that has worked on large, complex system knows that the margin of error in uptime and downtime is often whether the right person is within arms’ reach of their laptop. Is this true? Shouldn't giant tech companies obsess about reducing the need for human intervention? |
|
Giant tech companies do obsess about reducing the need for human intervention. Teams in my org at AWS kept track of failures/intervention rates per thousand instances. If it gets too high, it means you're spending too much engineering effort resolving on-call issues and need to fix it.