Hacker News new | ask | show | jobs
by chadnickbok 3668 days ago
Hey cool, someone forgetting _yet again_ why we use TCP.

We don't use TCP because its fast. We don't use it because its reliable (although that's really useful). We use it because _we kept breaking the internet_. Once you get above a certain threshold, the network can't keep up with you and packets start getting dropped. The problem is that backing off just a little doesn't allow the network to recover.

Instead, we need to use exponential backoff in the face of packet loss to ensure that the network as a whole can recover.

But if you're pretty much the only connection misbehaving, and everything else backs off, then you can kinda get away with not using exponential backoff. The problem is that the applications that is was "kinda okay" to do this for was VOIP and friends, where realtime delivery is really important and exponential backoff causes noticeable drops in quality.

For a great read about these kinds of issues, check out the TCP-Friendly rate control RFC: https://tools.ietf.org/html/rfc5348

1 comments

> Once you get above a certain threshold, the network can't keep up with you and packets start getting dropped. The problem is that backing off just a little doesn't allow the network to recover.

Another aspect of this problem is that the network is too hesitant to drop packets [1], so by the time you've noticed packet loss things have gotten bad enough that the drastic backoff is needed. Widespread deployment of ECN and AQM would allow for more rapid feedback before any huge backlog develops, and consequently a less extreme response to congestion signals could be used.

[1] Arista would rather their 10GbE switches add up to 100ms of queuing delay per port than drop a packet: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2016-J...

Slightly OT, but that link is astonishing.

That anyone can think adding 100ms of latency to a 10Gbe switch, even under heavy contention is a good feature is absolutely staggering.

It's not quite 100ms. A bit less. The explanation is simple : if tcp exponential backoff fires, you will have a very bad time on any tcp connection. Site owners, obviously, don't want that.

Try this : iptables -A INPUT -m statistic --mode random --probability 0.001 -j DROP

And see how your internet works. TLDR: sometimes loading times go through the roof, some instant messages go through in <0.1s, and on occasion it takes 30+ seconds, on occasion it's a DNS query that gets dropped and a page load suddenly takes 1 minute for no identifiable reason, large downloads always "get fucked" (suddenly lose 90% of their bandwidth and take several minutes to recover). Burstly traffic doesn't work. If you start your firefox with 20+ tabs open 80% of them will never load.

You will not enjoy the experience.

So yes, people think that adding 100ms of latency is better than dropping a packet under contention.

Your numbers are ridiculous. There's a huge gulf between buffering millions of packets per switch port before a single drop, and a 1 in 1000 drop probability. You're also assuming that the drops are indiscriminate when a refusal to consider AQM and fair queuing is what led Arista to this absurdity in the first place, and you're presuming that latencies would still be astronomical in a world without massive queues.

A 10GbE network in a datacenter without bufferbloat would have RTTs orders of magnitude smaller than the 100ms queuing delay Arista considers acceptable; the effects of a congestion event would be ancient history by the time Arista's queues could drain. Even outside the datacenter, 100ms is a pretty long time for most connections in a managed-queue world. A congestion event on a device using fq_codel won't kill your DNS request or TCP handshake; it'll slow down an established flow and if you're using ECN you won't even lose a packet. It's only in a DDoS-like scenario of thousands of unresponsive connections (such as TCPs with a large initial window) beginning to transmit simultaneously that you'd see some flows getting unfairly penalized, but things would equalize within a few RTTs if the traffic was real TCP and not a true DDoS. You only see it take minutes for a download's throughput to recover if you're going over multiple satellite links or through a severely bloated queue.

Okay make it one in a million. You will still be able to tell, and still see the phenomena I'm talking about.