| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by LinuxBender 1990 days ago

It would be very unusual for an ISP to drop idle connections. This implies all your connections are going through a layer 4 router. More likely you have a statefull firewall in the path somewhere. home router, server ISP firewall, etc...

[Edit: ] Or in this case, an ISP in Denmark that is trying to minimize ipv4 cost by using LSN (carrier grade nat) which also has many other drawbacks.

A SSH session does not generate any traffic

This does not have to be true. You can enable TCP keepalive in the server and client configuration.

Client via ~/.ssh/config:

  TCPKeepAlive yes
  ServerAliveInterval 60
  ServerAliveCountMax 2

Server via /etc/ssh/sshd_config:

  TCPKeepAlive yes
  ClientAliveInterval 60
  ClientAliveCountMax 2

Why are the TCP keepalives only sent after 2 hours?

Each OS has a default time set for keepalives. If you do not specify it in the ssh config, it will use the OS default. In Linux, you can set this in /etc/sysctl.conf:

  net.ipv4.tcp_keepalive_time = 60
  net.ipv4.tcp_keepalive_intvl = 60
  net.ipv4.tcp_keepalive_probes = 2

After adding this, run sysctl -e -p

You can see the timers on your established connections with:

  ss -emoian | grep tim

Note: TCP timers are not the same as ssh client and sshd server tcp keepalive packets. These are two distinctly different mechanisms that can accomplish the same thing. Not all applications support TCP socket keepalive. You can wrap applications with a library called libkeepalived to add support without code changes by using LD_PRELOAD.

In Windows this is set in the registry [1] On mac this is set via sysctl similar to Linux.

After you have adjusted your client and server config, restarted sshd on the server, then ssh to your server using the flag -vv and you will see the keepalive packets.

[1] - https://serverfault.com/questions/735515/tcp-timeout-for-est...

3 comments

theamk 1989 days ago

Correction: TCP keepalives, ssh server keepalives, and ssh client keepalives are three distinct and independent mechanisms. You only need one.

I usually just do client keepalives as they are easiest to set up. Server keepalives are good if you are worried about “forgotten” clients. TCP keepalives are usually not worth it IMHO.

gregmac 1989 days ago

I also changed to using client keepalives after something in our office network changed: they installed new switches and access points and suddenly my ssh sessions wouldn't stay open. After getting nowhere with IT (mainly just a low priority issue to them) it was just less frustrating to enable keepalives and the problem disappeared, so that's my default config everywhere ever since.

GekkePrutser 1989 days ago

I think TCP keepalives are conceptually the best though. As your problem occurs at the transport level, not the application layer.

This way you solve it where the issue occurs, and with the added benefit that it works for all TCP connections, not just SSH.

However I haven't had this issue. My isp is pretty ok in this regard and I supply my own router. So I don't know if there's issues with this in real life.

dfox 1989 days ago

Some NAT implementations ignore TCP keepalives. Alcatel branded ADSL modem/router I had used in 2005-ish certainly did and IIRC some more recent Zyxel ones do the same.

anderstrier 1990 days ago

I describe those workarounds in my post as well. But that only solves the problem for me.

Making my ISP fix the underlying issue - that their TCP connection idle-timeout is too short - will make sure all their customers won't have to encounter this problem.

LinuxBender 1990 days ago

Edit: I missed the part that their network used LSN.

anderstrier 1990 days ago

Please read the post. My ISP already confirmed the problem, and told me that they expect to roll out a fix this week. I live in Denmark, and here it is fairly common that ISPs do Carrier-grade NAT.

msh 1989 days ago

I think saying its common in denmark is properly overstating it a bit.

For wired connections I think its only the small newish ISPs + stofa that does CGN, the rest like tdc and telenor provides IPv4 to the CPE equipment.

I have hiper, they do CGN by default but if customers ask for it they can get a dynamic IPv4 for free or a fixed one for a small fee.

est31 1989 days ago

What does their fix look like? I guess you can't change this limit for all connections otherwise they'd have to buy more IP addresses for their NAT routers, so maybe they only fix it for SSH connections, them being few?

I had the same problem and did the ~/.ssh/config trick years ago. Interested in contacting my ISP so that they fix the problem for all users (although it might be fixed now, idk).

Symbiote 1989 days ago

Is there a list of Danish ISPs that do this?

I've had YouSee since I moved here, and I have a single public IPv4. I didn't realise that was not standard.

fogihujy 1989 days ago

When I lived in Denmark, 3 would often use carrier-grade NAT, but not always. Based on talk with colleagues back then, it's quite common with mobile broadband.

Here in Finland, the situation is similar; when using mobile broadband you usually end up behind CGNAT.

Luckily, most ISP's will happily provide a static IPv4 for you for a small fee.

LinuxBender 1990 days ago

I missed that part. I would not have expected that in Denmark. LSN is awful. You will be sharing source port depletion limitations with others in your network. That also means you can't host any servers unless you use port forwarding services or reverse vpns like hamachi. It also means you are sharing a SNAT with others on your network which means that malicious traffic from others could be attributed to you. Glad they are fixing it for you. If they didn't, then one would hope there were other ISP options.

Any ISP using LSN will have low NAT timeouts because it takes memory on their routers to track sessions and state. I would be surprised if your ISP remove timeouts unless they are letting it fall back to FIFO pruning on your segment. Did they tell you what they are changing?

toast0 1989 days ago

It sounds like he's paid his ISP for a (dedicated) public IP, so it should be 1:1 NAT, which doesn't really need connection tracking.

For the rest of the customers that don't pay extra for a public IP, all the crappy things you mention do apply.

Hopefully, the ISP does native IPv6?

And, while 60 minute timeouts violate the RFC, it's a whole lot better than I expected. Usually CGN timeouts are around 15 minutes for nice ones, and I've seen 10 seconds at the bottom end.

I wish the longer ones would probe both ends of the connection to see if it's still live a minute or so before they intend to kill it.

eznzt 1989 days ago

What you say sounds very dramatic, but the truth is that CGNAT is good enough for 99.9% of users.

BlueTemplar 1989 days ago

That's bullshit, CGNAT is likely to cause all sorts of issues that the average users aren't going to realize being caused by their "I"SP (A frequent one : being unable to host video game sessions). They aren't getting real Internet, and are being treated as second tier citizens.

arp242 1989 days ago

Yeah, my ISP uses it. It does come with some of the downsides the previous poster mentioned: the inability to make myself reachable from $the_world can be annoying, and I get a captcha on Google every time because of "unusual traffic" (I mostly use DDG, but sometimes fall back to it). Also, ACM blocked me at some point because "my IP is infiltrated by SciHub" (their words).

In the end, it's an imperfect solution for a real problem that mostly works well enough.

aMadMan 1989 days ago

ServerAliveInterval and/or ClientAliveInterval fix that behaviour just fine, for everyone, if they use it.

amelius 1989 days ago

Do you know of some mechanism that makes ssh sessions survive a power-suspend (on a Linux desktop)?

dfox 1989 days ago

That is pretty much an inverse problem ;)

If you care about that you probably should use mosh as that does solve that by design and not by random chance.

On the other hand using VPN with fixed tunneled endpoint IPs causes idle TCP connections going through it to remain connected pretty much indefinitely.

jeffbee 1989 days ago

Just don’t use keep-alive feature. Without keep-alive traffic the peers have no way to tell the interface was transiently unavailable.

jusssi 1989 days ago

This is needed, in addition to the machine getting the same IP address back after resume (static assignment or long DHCP leases).

lytedev 1989 days ago

You should check out mosh.