Hacker News new | ask | show | jobs
by LinuxBender 1990 days ago
It would be very unusual for an ISP to drop idle connections. This implies all your connections are going through a layer 4 router. More likely you have a statefull firewall in the path somewhere. home router, server ISP firewall, etc...

[Edit: ] Or in this case, an ISP in Denmark that is trying to minimize ipv4 cost by using LSN (carrier grade nat) which also has many other drawbacks.

A SSH session does not generate any traffic

This does not have to be true. You can enable TCP keepalive in the server and client configuration.

Client via ~/.ssh/config:

  TCPKeepAlive yes
  ServerAliveInterval 60
  ServerAliveCountMax 2
Server via /etc/ssh/sshd_config:

  TCPKeepAlive yes
  ClientAliveInterval 60
  ClientAliveCountMax 2
Why are the TCP keepalives only sent after 2 hours?

Each OS has a default time set for keepalives. If you do not specify it in the ssh config, it will use the OS default. In Linux, you can set this in /etc/sysctl.conf:

  net.ipv4.tcp_keepalive_time = 60
  net.ipv4.tcp_keepalive_intvl = 60
  net.ipv4.tcp_keepalive_probes = 2
After adding this, run sysctl -e -p

You can see the timers on your established connections with:

  ss -emoian | grep tim
Note: TCP timers are not the same as ssh client and sshd server tcp keepalive packets. These are two distinctly different mechanisms that can accomplish the same thing. Not all applications support TCP socket keepalive. You can wrap applications with a library called libkeepalived to add support without code changes by using LD_PRELOAD.

In Windows this is set in the registry [1] On mac this is set via sysctl similar to Linux.

After you have adjusted your client and server config, restarted sshd on the server, then ssh to your server using the flag -vv and you will see the keepalive packets.

[1] - https://serverfault.com/questions/735515/tcp-timeout-for-est...

3 comments

Correction: TCP keepalives, ssh server keepalives, and ssh client keepalives are three distinct and independent mechanisms. You only need one.

I usually just do client keepalives as they are easiest to set up. Server keepalives are good if you are worried about “forgotten” clients. TCP keepalives are usually not worth it IMHO.

I also changed to using client keepalives after something in our office network changed: they installed new switches and access points and suddenly my ssh sessions wouldn't stay open. After getting nowhere with IT (mainly just a low priority issue to them) it was just less frustrating to enable keepalives and the problem disappeared, so that's my default config everywhere ever since.
I think TCP keepalives are conceptually the best though. As your problem occurs at the transport level, not the application layer.

This way you solve it where the issue occurs, and with the added benefit that it works for all TCP connections, not just SSH.

However I haven't had this issue. My isp is pretty ok in this regard and I supply my own router. So I don't know if there's issues with this in real life.

Some NAT implementations ignore TCP keepalives. Alcatel branded ADSL modem/router I had used in 2005-ish certainly did and IIRC some more recent Zyxel ones do the same.
I describe those workarounds in my post as well. But that only solves the problem for me.

Making my ISP fix the underlying issue - that their TCP connection idle-timeout is too short - will make sure all their customers won't have to encounter this problem.

Edit: I missed the part that their network used LSN.
Please read the post. My ISP already confirmed the problem, and told me that they expect to roll out a fix this week. I live in Denmark, and here it is fairly common that ISPs do Carrier-grade NAT.
I think saying its common in denmark is properly overstating it a bit.

For wired connections I think its only the small newish ISPs + stofa that does CGN, the rest like tdc and telenor provides IPv4 to the CPE equipment.

I have hiper, they do CGN by default but if customers ask for it they can get a dynamic IPv4 for free or a fixed one for a small fee.

What does their fix look like? I guess you can't change this limit for all connections otherwise they'd have to buy more IP addresses for their NAT routers, so maybe they only fix it for SSH connections, them being few?

I had the same problem and did the ~/.ssh/config trick years ago. Interested in contacting my ISP so that they fix the problem for all users (although it might be fixed now, idk).

Is there a list of Danish ISPs that do this?

I've had YouSee since I moved here, and I have a single public IPv4. I didn't realise that was not standard.

When I lived in Denmark, 3 would often use carrier-grade NAT, but not always. Based on talk with colleagues back then, it's quite common with mobile broadband.

Here in Finland, the situation is similar; when using mobile broadband you usually end up behind CGNAT.

Luckily, most ISP's will happily provide a static IPv4 for you for a small fee.

I missed that part. I would not have expected that in Denmark. LSN is awful. You will be sharing source port depletion limitations with others in your network. That also means you can't host any servers unless you use port forwarding services or reverse vpns like hamachi. It also means you are sharing a SNAT with others on your network which means that malicious traffic from others could be attributed to you. Glad they are fixing it for you. If they didn't, then one would hope there were other ISP options.

Any ISP using LSN will have low NAT timeouts because it takes memory on their routers to track sessions and state. I would be surprised if your ISP remove timeouts unless they are letting it fall back to FIFO pruning on your segment. Did they tell you what they are changing?

It sounds like he's paid his ISP for a (dedicated) public IP, so it should be 1:1 NAT, which doesn't really need connection tracking.

For the rest of the customers that don't pay extra for a public IP, all the crappy things you mention do apply.

Hopefully, the ISP does native IPv6?

And, while 60 minute timeouts violate the RFC, it's a whole lot better than I expected. Usually CGN timeouts are around 15 minutes for nice ones, and I've seen 10 seconds at the bottom end.

I wish the longer ones would probe both ends of the connection to see if it's still live a minute or so before they intend to kill it.

What you say sounds very dramatic, but the truth is that CGNAT is good enough for 99.9% of users.
That's bullshit, CGNAT is likely to cause all sorts of issues that the average users aren't going to realize being caused by their "I"SP (A frequent one : being unable to host video game sessions). They aren't getting real Internet, and are being treated as second tier citizens.
Yeah, my ISP uses it. It does come with some of the downsides the previous poster mentioned: the inability to make myself reachable from $the_world can be annoying, and I get a captcha on Google every time because of "unusual traffic" (I mostly use DDG, but sometimes fall back to it). Also, ACM blocked me at some point because "my IP is infiltrated by SciHub" (their words).

In the end, it's an imperfect solution for a real problem that mostly works well enough.

ServerAliveInterval and/or ClientAliveInterval fix that behaviour just fine, for everyone, if they use it.
Do you know of some mechanism that makes ssh sessions survive a power-suspend (on a Linux desktop)?
That is pretty much an inverse problem ;)

If you care about that you probably should use mosh as that does solve that by design and not by random chance.

On the other hand using VPN with fixed tunneled endpoint IPs causes idle TCP connections going through it to remain connected pretty much indefinitely.

Just don’t use keep-alive feature. Without keep-alive traffic the peers have no way to tell the interface was transiently unavailable.
This is needed, in addition to the machine getting the same IP address back after resume (static assignment or long DHCP leases).
You should check out mosh.