Hacker News new | ask | show | jobs
by kerneis 1271 days ago
There are several points in the article where the examples didn't make sense at all to me. Overall an interesting article, but I'm either a bit dense this morning, or it's sloppy in the details and explanations

For instance, in table 3, it looks like they excluded backend tasks {0,1} (for frontend tasks {0, 1}) then {2,3} (for frontend tasks {2,3}) in the N=10 case, but backend tasks {1,2} then {3,4} in the N=11. Why the discrepancy? I get that it helps them make the point about task 3 changing subset, but it's inconsistent with excluding left-overs in a round-robin fashion presented in the previous paragraph.

Another sentence that I couldn't make sense of is: "If these [tasks 2 and 4] carry over to the subset of the next frontend task, you might get the shuffled backend tasks [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], but you can't assign backend task 2 to the same frontend task. " The "same frontend task" as what? Obviously note the one task 2 was already assigned to (the most intuitive reading to me), since precisely task 2 was not assigned and is a left-over. But then again, what does this mean?

3 comments

> in table 3

Figure 3 is an (arbitrary) example of round-robin sunsetting with randomized sunset ordering. The point is to demonstrate how bad backend churn is with this algorithm, by inspecting a normal example of the decisions this algorithm makes.

> The "same frontend task" as what? [...] what does this mean?

It's not phrased great, but it's also tricky to communicate. My read is this: given backend shuffles [9, 1, 3, 0, 8, 6, 5, 7, 2, 4], [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], if you combine these shuffles to choose a backend assignment, you end up with subsets {9, 1, 3, 0}, {8, 6, 5, 7}, {2, 4, 7, 2}, {0, 8, 9, 1}, {4, 5, 3, 6}. That third subset means the third frontend only has a subset of three backends, even though you want it to have four.

The rest is reductio ad absurdum- reasoning through the ways you might fix this, and explaining why they in turn don't work. (I believe there's also an implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination, only static before-the-fact coordination amongst the front ends, i.e. agreement on a hash seed for a given subset, and say which hashing strategy the front ends would use).

(article author here)

> That third subset means the third frontend only has a subset of three backends, even though you want it to have four.

This explanation is correct, thanks. Alas, word limits demand brevity.

> implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination

An earlier iteration of this article included coordination as one of the properties, but this unfortunately had to be cut. AFAICS, the only other two kinds of coordination are “frontend tasks talk to each other” or “frontend tasks ask a subsetting service for their subset”. Within Google, both of these options are unacceptable: we either introduce the risk that a rogue frontend task brings down all the frontends, or introduce new unappealing failure modes (what do you do if the subsetting service is unavailable?). There is potential for other subsetting algorithms in this space, and while I’d be excited to see them, I’m mildly sceptical about their practicality at scale.

Thanks for confirming!

Yeah, the brevity thing is always tricky- the classic problem with any academic paper is that you need to assume your reader has some level of background that enables them to follow your reasoning; otherwise you end up derailing your paper by explaining too much of the background material.

FYI your claim re coordination isn't true across the board. The second option is used for some services (Slicer has an opt-in integration on the L2s).

(xoogler here ;)

> For instance, in table 3, it looks like they excluded backend tasks {0,1} (for frontend tasks {0, 1}) then {2,3} (for frontend tasks {2,3}) in the N=10 case, but backend tasks {1,2} then {3,4} in the N=11. Why the discrepancy?

With N = 10, there will be N mod k = 10 mod 4 = 2 leftover tasks, and so the round-robin fashion excludes {0, 1} then {2, 3}. However for N = 11, there will be N mod k = 11 mod 4 = 3 leftover tasks, so the round-robin fashion excludes {0, 1, 2} then {3, 4, 5}.

But as joatmon-snoo correctly said, the more important point is demonstrating how bad backend churn is with this algorithm.

OK, that makes a lot of sense. Thanks for taking the time to clarify!

> But as joatmon-snoo correctly said, the more important point is demonstrating how bad backend churn is with this algorithm.

Yes, again the overall point came across clearly, but faced with specific examples I like to dive into the details to check my understanding of how things work. Otherwise, it's easy to overlook key but subtle details.

Other typos: Bbackend; missing math formula in "(for a frontend task m, this is )" — should be ⎣m/L⎦ I guess. Proofreading at ACM seems to be lacking.