|
|
|
Ask HN: What kinds of engineers handle global cloud shortages?
|
|
1 points
by jeffe
1862 days ago
|
|
Reading through incident reports of major cloud outages is very informative. Both the scale and the speed at which engineers root cause, implement fixes, and safely deploy is really impressive. I thought about it and realized even the senior engineers I know don't handle that kind of scale. I was curious if anyone knows any of these people/beings (or are them) and could speak to what they are like in terms of professional experience and interactions, maybe anecdotes of wisdom? |
|
https://sre.google/sre-book/table-of-contents/ is a good source to start with.
And ultimately, the perhaps most critical distinction - opportunity. You only work on a huge distributed system when there's enough customer demand for it to need to exist, and every large system started as a smaller system. That scaling of demand scales importance, which scales the effort invested in its reliability/scalability/efficiency/etc. You can read great stories about the many failures at Google, or Twitter (remember the Fail Whale?), or any other large company. The maturity you see now was developed over time, and any newly hired engineers will be trained into the culture of maintaining and improving it further. With few exceptions, the folk that incepted the big systems back when they were small aren't the ones scaling them out today anyways - it's a very teachable skill, given the need and opportunity.