Hacker News new | ask | show | jobs
by outsidein 814 days ago
99% MTU size. Had this recently specifically with TLS due to large initial packets containing certificates. Results could even depend on user agent, some fail some will work.

try to reduce MTU on client, 1280 is a good starting point.

6 comments

The article mentions that it happens both over HTTP and HTTPS.

I'd ask OP to check if this is only affects a subset of their IPs from https://bunnycdn.com/api/system/edgeserverlist, or whether all of their IPs are affected using `curl --resolve bunnycdn-hosted-website.com:80:some-other-ip http://bunnycdn-hosted-website.com`.

Besides that, the author points out that the final handshake ACK never reaches the server and that packet is small, not going to go over the mtu.
Indeed, it's right there in the packet capture screenshot. The ack has payload length 0.

I've debugged a lot of TCP/IP issues over the years but this one has me scratching my head. The author has done reasonable troubleshooting: tried from different devices and operating systems, HTTP and HTTPS, over wired and WiFi, and to different destinations. The common denominator is the wired network.

It can't hurt to reduce the MTU, but I see nothing in the evidence presented that this is likely to be the cause.

I once had a destination firewall blocking packets from Linux but not OS X and it turned out to be that Linux was an early adopter of ECN and the destination firewall rejected any packets with the ECN bits set. I've also had frame relay networks with MTU limitations, NICs with corrupted checksums, overflowing NAT tables, asymmetric ARP tables, misconfigured netmasks, and stuff I'm sure I've forgotten.

But we don't know the full story of http as no capture was provided. Typically when you have an mtu issue you would get stuck on the tls handshake, as we are in this case for Https, so in the http capture we should see a 301 redirect if it's an mtu issue.
Agreed, my best guess it's due to a smaller MTU between the CDN and your device. They are probably replying with TLS Server Hello which would typically max a standard 1500 byte packet. It's also likely why HTTP isn't working either since they would ACK the connection, you would probably be able to issue the GET / but you would never get a response back due to the HTTP response payload being larger than a single packet.

A few ideas to test this theory: 1) Find an asset on their server that is smaller than 500-1000 bytes so the entire payload will fit in a packet. Maybe a HEAD would work? 2) Clamp your MSS on this IP to something much smaller like 500 instead of the standard 1460. This should force the server to send smaller packets and will work better in practice than changing your MTU. See: https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.cookbook.mtu-...

The Ack in TCP handshake is obviously dropped (as the server resends SYN+ACK). So probably has nothing related to MTU.
I believe this is relatively easy to test as I think you can gradually increase the size of the ICMP packet until it stops responding. I have done something along those lines in the past but it was a long time ago.
It's the ack in the handshake that is dropped. A filter for 169/8 matching on established connections and only outbound would cause that.
Edit: on reading a few more comments, I think this is probably all wrong...

The TLS Client hello is not that big (the client sent FIN is seq=518), and the server is only sending packets with SEQ=0. As others pointed out this likely means that the server that received the SYNs is not receiving the final ACK and data packets.

From what I can tell, the example IP is not broadly anycast. From my test hosts in Seattle, traceroute takes me trhough transit to San Jose, and then either

vl201.sjc-eq10-dist-1.cdn77.com or vl202.sjc-eq10-dist-1.cdn77.com and finally

169-150-221-147.bunnyinfra.net

I'm not sure how easy it is to run a traceroute with tcp with different flags. But if the OP can run a traceroute with only the SYN flag, and again with only the ACK flag, that might be pretty interesting. I suspect this is an issue inside BunnyCDN's network where packets from this user/network with SYN go to one server host, and with ACK go to another. Maybe there's an odd router somewhere that's routing these differently, but if they both make it to Bunny, they should both work.

With

    $ traceroute --version
    Modern traceroute for Linux, version 2.1.2
    Copyright (c) 2016  Dmitry Butskoy,   License: GPL v2 or any later
I can specify to do a traceroute with syn or ack with

     traceroute 169.150.221.147 -p 443 -q 1 -T -O ack
or

     traceroute 169.150.221.147 -p 443 -q 1 -T -O syn
Wrong answer about MTU below for posterity:

Yeah, that would be my bet too. Especially with a after 60 seconds things start to work, I think that's the timeout for windows to do PMTU Blackhole probing (which is painfully slow; iOS and I think MacOS do it much sooner; I think even Android has gotten around to doing it in a reasonable amount of time)

I've got a test site up that might work for the OP http://pmtud.enslaves.us/

But, if it's really only happening with BunnyCDN, it's possible that most of their routes are 1500 MTU clean (or have working path MTU) and only the routes to get to BunnyCDN aren't. Of course, a lot of popular services intentionally drop their advertised MTU and allowed outbound MTU to work around the many broken networks out there, so service X and Y works doesn't really mean the path is clean.

ClientHello isn't that big but ServerHello that's in the reply can be quite large and since TCP packets have DF flag set, some middleware box may toss it if PMTUD didn't work correctly.

I had seen this exact issue with Fastly a few years ago.

Yeah, I expected a large ServerHello, but then I would expect the server to send Seq=[LargeNumber] packets. Often you'd get an ACK for the ClientHello, then a missed packet or several, then the final packet of the ServerHello which is often small. Or at least an ack from the resend of ClientHello with a large sequence number.

I guess I've seen pmtud issues way too often in my life, and I just jumped ahead. :D