Hacker News new | ask | show | jobs
by bklyn11201 2049 days ago
Totally anecdotal evidence, but I was in a rural NY house served by DSL for the past 6 months. The DSL has consistent packet loss between 4 and 6%. The only video service that could handle this level of packet loss well was Amazon Prime. Netflix couldn't even load its browse screen until the past two weeks, where something changed, and suddenly Netflix could handle the high packet loss as well as Amazon Prime.

Thank you to the engineers and developers!

8 comments

Seperate annecdote - I worked on an inflight satelite wifi project and I was surprised at how well both Youtube and Netlifx worked over a medium-bandwidth/high-latency connection.

Granted, we had specific QoS/traffic shaping to improve reliability without gobbling up all the bandwidth (stream Netflix was an advertised feature of the wifi service), but it still seemed like magic.

When Plex rolled out it's auto quality/auto bandwidth adjustment it actually worked very well over airplane satellite wifi as well. I watched a few things from my own server.

I'm amazed that service allowed streaming though...

For a good few years, a lot of airlines had Netflix and Hulu throttled on their free WiFi services, but not Twitch, so I’d just watch videogames on all my flights. My theory (which I really believe is true) is that they just hadn’t heard of it, and hadn’t blacklisted it!
YouTube has gotten way better in the past couple of years. When they first launched DASH streaming, it was terrible on high-latency international connections. If a US-based content creator uploaded a video and you were the first to view it in your region, you could actually notice how it was populating the CDN and it was unwatchable without disabling DASH and using the old-fashioned buffered player. These days it's flawless for me in nearly every situation.
Wow! You actually stream Netflix to an airplane? I always guessed that inflight VOD services had the movies stored in a server on the plane.
Things like the Delta “gogo in-flight entertainment” do store their movies on the plane, but people will want to watch their Netflix/Prime/etc content on the plane as well.
Both.

There’s “inflight entertainment” where all the movies/shows are indeed stored locally on the plane, with either seatback or custom/white label streaming app for BYOD.

But in addition they were advertising streaming Netflix and YouTube over the satellite WiFi.

This sounds like an MTU issue. TCP takes care of mere (eg probabilistic) packet loss ok. MTU issues have actually crept back up because TLS exacerbates any underlying MTU problems. IPv6 doubly so (when any hops - especially yours - don’t follow path MTU detection requirements).
TCP doesn't take care of packet loss. What TCP does is make sure your packets are not lost, even if you have 99% packet loss. On the flip-side, that means that if TCP can't deliver a single packet (say out of a billion), the whole stream stops at this one packet...

Which is why TCP is a horrible choice for any streaming service and a horrible choice for lossy connections, and I would be quite surprised if Netflix relied on it. UDP is the perfect choice for streaming, since video decoders can handle packet loss pretty well. The rest you can achieve with good tradeoff between Reed-Solomon codes and key framing.

I can't find any solid source for it, but I think most web video streams are TCP:

https://news.ycombinator.com/item?id=8638946

Even the live ones like Twitch.

Because they all want to run through HTML5 web browsers, re-use the same TLS as everyone else, and not write a ton of new code.

When QUIC gets big, they'll probably switch to UDP - Not cause it's better on every connection, but because it will be popular and it will be better on lossy connections. But for now TCP does work fine.

That's why youtube-dl can rip video without implementing tons of weird proprietary protocols - It's just HTTPS. Otherwise these video sites wouldn't run at all in Firefox.

I'm not sure this statement is generally true for Netflix's use case.

UDP provides no out of order packet handling which _needs_ to be handled for video streaming. UDP is by default unbuffered throughout transport and tends to cause greater stress to client systems since they need to respond per packet rather than per traffic stream (IP+port combo). As a client developer, you end up reimplementing 90-95% of what TCP gives you out of the box at great development and QA cost. You also drain battery on mobile devices with all the interrupts your causing doing UDP. The upside with a UDP-based implementation is the latency from server to client display is usually much less (tens of milliseconds vs hundreds to thousands), but the trade-offs involved are almost never worth it for a static media streaming site like Netflix.

Even dynamic media streaming sites like Twitch rarely dip into UDP server-client implementations unless there are some unusual requirements.

You'd only allow packet loss without re-transmission (e.g. pure UDP) if you really need low latency, like for a video call.

Netflix is pure TCP I'm sure - look up HLS and DASH.

Aren’t MTU issues typically only up to a router? As in, even if the parent had a different MTU than Netflix uses, it wouldn’t matter since their router or the ISP’s router will transform packets between their appropriate MTUs?

And if this is true, then how could it be that Amazon works without problem and Netflix doesn’t?

"how could it be that Amazon works without problem and Netflix doesn’t"

Supporting Path MTU discovery (PMTUD), or perhaps just capping their outbound packets to 1450 or similar. Cloudflare found and fixed a problem in this space: https://blog.cloudflare.com/path-mtu-discovery-in-practice/

Oh wow, TIL about the “don’t fragment” bit and all the stuff that comes with it.

Thanks for sharing, I learned a lot from that blog post.

It's not unusual for a server to also be a router in a layer 3 link aggregation setup. It's extremely common for IPs to be load-shared amongst servers using ECMP. If each server is connected to 2 Top-of-rack (TOR) switches and advertises the route to the shared IP through both TORs, you can very easily have ICMP probes used for PMTU take the wrong route and be dropped. The result is a TCP session with a default MTU that may not work along all traversed paths and will suffer from fragmentation.
>TCP takes care of mere (eg probabilistic) packet loss ok.

I'd imagine this is largely due to MSS clamping rather than actual MTU caused packet loss.

Isn’t streaming done usually via UDP?
It’s typically all HTTP requests; nowadays with HTTP3 we are back to using UDP, but apart from real-time video conferencing etc I don’t believe many streaming services use anything other than HTTP.
HTTP over TCP to cache nodes.

Fire up the developer tools / network view and go watch a Netflix video; try pausing, etc. It is incredibly straightforward.

... no it doesn't. Like not even close.
> Netflix couldn't even load its browse screen until the past two weeks

I assume the browse screen is based entirely on TCP?

I'm struggling to understand why packet loss would prevent it from loading -- it should be slower but TCP should handle re-transmission, no?

Or is Netflix doing something tricky with UDP even in their browsing UX?

If I had to guess they probably had timeouts that were too aggressive. Client timeouts are a very hard problem because it is difficult to tell the difference between "working, but slowly" and "something went wrong, the best bet is to try again".

Back in the day we used to have timeouts based on individual reads/writes which will often better answer "is this HTTP request making progress". However the problem with these sort of timeouts is they don't compose well so most people end up having an end-to-end deadline.

I doubt Netflix is doing anything tricky with UDP anywhere in their stack.

QUIC doesn't count because it's not tricky.

I'd love to see a source for this but seeing as YouTube works great over regular HTTP and TCP, I doubt anyone else is out in the weeds trying some custom UDP solution and reinventing wheels.

Slightly unrelated but does the packet loss happen all the time or when close to maximum of the line.

Used to have similar problems with an ADSL line but found if I limited the line (Both up and down) I could find a magic number where the packet loss went away. (Well most of the time :))

Though it did need to be tuned for different times of they . ie high congestion times need it to be lower.

Though technically it shouldn't be your problem :(

This is normal if your router doesn’t prioritize control traffic. A rate limit allows all the ACKs to normally leave your network instead of getting queued up.
Or your router isn't responding correctly to traffic controls or the ISP isn't sending them correctly? I know with one provider I had in the deep past the allowable packet size was smaller than what most devices default to and they weren't correctly sending the maximum size their routers supported in the appropriate ICMP requests. Eventually I figured out that I could force my router to a smaller allowed packet size and that at least decreased packet loss substantially going upstream, even if whatever misconfiguration of the ISP was still confusing and eating downstream packets.
It happens nearly all the time. We use very little DSL bandwidth but are quite rural (miles from primary telephone infrastructure)
Dropped packets are often a symptom that the MTU value is set too high. That would be uncorrelated to congestion, though.
I'd believe it. When you know that there is going to be packet loss (whether from the user's spotty internet or from internal load-shedding), building your applications to be as resilient as possible to it makes sense. The infrastructure experimentation platform mentioned in the article is probably helpful for sniffing out potential trouble-spots in applications.
Any chance there weren't any line filters on the POTS equipment? I haven't had DSL in years but when I did I had to have filters on any telephone devices connected to the same line.
How did you measure 4 to 6% packet loss? Do you have scripts to ping some server and you are collecting packet loss data? I would like to collect such data for my home network and am curious.
Smokeping is one of the better-known tools for tracking latency and loss over time: https://oss.oetiker.ch/smokeping/
Simple ping command actually prints statistics
there has been the good kind of capitalism going on between the video streaming services before. earlier on I remember netflix was way better than amazon, but amazon upped their game since.