|
> in table 3 Figure 3 is an (arbitrary) example of round-robin sunsetting with randomized sunset ordering. The point is to demonstrate how bad backend churn is with this algorithm, by inspecting a normal example of the decisions this algorithm makes. > The "same frontend task" as what? [...] what does this mean? It's not phrased great, but it's also tricky to communicate. My read is this: given backend shuffles [9, 1, 3, 0, 8, 6, 5, 7, 2, 4], [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], if you combine these shuffles to choose a backend assignment, you end up with subsets {9, 1, 3, 0}, {8, 6, 5, 7}, {2, 4, 7, 2}, {0, 8, 9, 1}, {4, 5, 3, 6}. That third subset means the third frontend only has a subset of three backends, even though you want it to have four. The rest is reductio ad absurdum- reasoning through the ways you might fix this, and explaining why they in turn don't work. (I believe there's also an implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination, only static before-the-fact coordination amongst the front ends, i.e. agreement on a hash seed for a given subset, and say which hashing strategy the front ends would use). |
> That third subset means the third frontend only has a subset of three backends, even though you want it to have four.
This explanation is correct, thanks. Alas, word limits demand brevity.
> implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination
An earlier iteration of this article included coordination as one of the properties, but this unfortunately had to be cut. AFAICS, the only other two kinds of coordination are “frontend tasks talk to each other” or “frontend tasks ask a subsetting service for their subset”. Within Google, both of these options are unacceptable: we either introduce the risk that a rogue frontend task brings down all the frontends, or introduce new unappealing failure modes (what do you do if the subsetting service is unavailable?). There is potential for other subsetting algorithms in this space, and while I’d be excited to see them, I’m mildly sceptical about their practicality at scale.