| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joatmon-snoo 1282 days ago

> in table 3

Figure 3 is an (arbitrary) example of round-robin sunsetting with randomized sunset ordering. The point is to demonstrate how bad backend churn is with this algorithm, by inspecting a normal example of the decisions this algorithm makes.

> The "same frontend task" as what? [...] what does this mean?

It's not phrased great, but it's also tricky to communicate. My read is this: given backend shuffles [9, 1, 3, 0, 8, 6, 5, 7, 2, 4], [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], if you combine these shuffles to choose a backend assignment, you end up with subsets {9, 1, 3, 0}, {8, 6, 5, 7}, {2, 4, 7, 2}, {0, 8, 9, 1}, {4, 5, 3, 6}. That third subset means the third frontend only has a subset of three backends, even though you want it to have four.

The rest is reductio ad absurdum- reasoning through the ways you might fix this, and explaining why they in turn don't work. (I believe there's also an implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination, only static before-the-fact coordination amongst the front ends, i.e. agreement on a hash seed for a given subset, and say which hashing strategy the front ends would use).

1 comments

flowblok 1281 days ago

(article author here)

> That third subset means the third frontend only has a subset of three backends, even though you want it to have four.

This explanation is correct, thanks. Alas, word limits demand brevity.

> implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination

An earlier iteration of this article included coordination as one of the properties, but this unfortunately had to be cut. AFAICS, the only other two kinds of coordination are “frontend tasks talk to each other” or “frontend tasks ask a subsetting service for their subset”. Within Google, both of these options are unacceptable: we either introduce the risk that a rogue frontend task brings down all the frontends, or introduce new unappealing failure modes (what do you do if the subsetting service is unavailable?). There is potential for other subsetting algorithms in this space, and while I’d be excited to see them, I’m mildly sceptical about their practicality at scale.

joatmon-snoo 1280 days ago

Thanks for confirming!

Yeah, the brevity thing is always tricky- the classic problem with any academic paper is that you need to assume your reader has some level of background that enables them to follow your reasoning; otherwise you end up derailing your paper by explaining too much of the background material.

FYI your claim re coordination isn't true across the board. The second option is used for some services (Slicer has an opt-in integration on the L2s).

(xoogler here ;)