Hacker News new | ask | show | jobs
by forreal1126 2662 days ago
too bad 99% of routers drop keepalives
2 comments

TCP keepalives != HTTP keepalives.
Also, in my experience at least, it's not necessarily that routers drop TCP keep-alives, but rather that the keep-alive interval for most OSes is way longer than the router's connection timeout for idle entries in the NAT table.

I was burned hard by this in Azure. It seems that the default expiry time is around 4 minutes for the TCP load balancers. You can bump it to 30 min, but if I recall the default interval on Linux is 2 hours. Any long-standing idle TCP connections would get into a state where both sides believed they were connected, but the packets would get dropped to the floor. When the LB timed out, it didn't emit any FIN or RST packets, so neither side knew it had been torn down.

Fun debugging on that one. During the day there was enough activity to keep the connections alive, but at night they'd break. The overall behaviour was that the service worked great all day, but the first few actions out-of-business-hours would fail due to application-layer timeouts, and then everything would work great again until it had sat idle for a while.

You send heartbeats! There might be a max-connection-time but I haven't run into it, my connections being dropped through amazon infrastructure was solved by sending a few bytes (': <3' or '<!-- <3 -->') every 5 seconds or so.
TCP keepalive should solve the problem too. Rather than HTTP keepalive.

(i.e. To handle the case of "HTTP-Request", "huge delay", "final response". Rather than a streaming/chunking reply that is very long/slow.)

See my sibling post. TCP keep-alive can work, but you probably need to fiddle with OS-default settings for modern network equipment. I personally find the behaviour abhorrent, but my beard has more grey in it every day and I've accept that "this is how it is now"