| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tptacek 5532 days ago

This is a great paper. If you haven't read it, it suggests a common scenario where endemic network delays tend to nudge all participants in a periodic broadcast protocol to send their broadcasts at the same time, so that some hours after you start all the participants, everyone has synchronized and on a timer saturates the network with updates.

The solution (I didn't reread so this is from memory) is to add random jitter to each participant's timer.

However, is there evidence to suggest that's what happened to Amazon? I can see this being a big issue in '93 with high-latency low-bandwidth links a commonplace. But we think that Amazon wasn't engineered well enough to deal with multiple orders of magnitude spikes in C&C traffic?

Thank you, though, for posting a (much needed) technical comment to this discussion.

1 comments

pumpmylemma 5532 days ago

I don't think it was a symptom of routing synchronization specifically, but I'd be curious to know if it was a case of unexpected and undesired synchronization. (E.G. An independent and random cluster of blocks suddenly updated; the network was saturated; it pulled in more updates; ...)

And yes, the paper talked about randomization. It also pointed out the magnitude of randomization required was larger than expected.

link

pandakar 5531 days ago

Has there been an official explanation?

link

pumpmylemma 5531 days ago

As far as I'm aware, no. That's why RightAWS said they get an F for communication.

link