Hacker News new | ask | show | jobs
by lumost 1691 days ago
The point is that every team gets to set their own failure modes. I know of multiple tier-1 services which diverge from at least one best practice.

Think of the scenario where a cloud provider needs to evacuate an az. There is no API which would allow the compute team to force migrate tens of thousands of apps and guarantee that they both are not effected and maintain their redundancy guarantees.

Internal services at google are in the same boat. However google knows about the hard edges and forces everyone to deal with all of that complexity - there is no api which the serving team could plug into which will avoid this overhead.

2 comments

That still at no point requires the application's team to make decisions about which two PCR zones to pick and which cells within it to pick, which [decision] can still be cleanly abstracted away, and would still be a mixing of unrelated concerns, and so your comments are still orthogonal to the point I was bringing up here.

Edit: It might help to check out my comment here, where I clarify what a dev should vs shouldn't have to worry about: https://news.ycombinator.com/item?id=29085638

While what you say is true, I think GP is ultimately correct. You can have a system define a convention and allow bypassing it, instead of forcing everyone to start from scratch. In fact, this is the approach that pretty much any modern service at Google will use.