Hacker News new | ask | show | jobs
by klabb3 1014 days ago
A random thing I ran into with the defaults (Ubuntu Linux):

- net.ipv4.tcp_rmem ~ 6MB

- net.core.rmem_max ~ 1MB

So.. the tcp_rmem value overrides by default, meaning that the TCP receive window for a vanilla TCP socket actually goes up to 6MB if needed (in reality - 3MB because of the halving, but let's ignore that for now since it's a constant).

But if I "setsockopt SO_RCVBUF" in a user-space application, I'm actually capped at a maximum 1MB, even though I already have 6MB. If I try to reduce it from 6MB to e.g. 4MB, it will result in 1MB. This seems very strange. (Perhaps I'm holding it wrong?)

(Same applies to SO_SNDBUF/wmem...)

To me, it seems like Linux is confused about the precedence order of these options. Why not have core.rmem_max be larger and the authoritative directive? Is there some historical reason for this?

2 comments

> (Same applies to SO_SNDBUF/wmem...)

If you want to limit the amount of excess buffered data you can lower TCP_NOTSENT_LOWAT instead, which caps the amount that is buffered beyond what's needed for the BDP.

net.ipv4.tcp_rmem max is a limit for the auto-tuning the kernel performs

once you do SO_RCVBUF the auto-tuning is out of the picture for that socket, and net.core.rmem_max becomes the max.

It's pretty clearly documented @ Documentation/networking/ip-sysctl.rst

Edit: downvotes, really? smh

1. While your context about auto-tuning is accurate and valuable, it doesn't really address the fundamental strangeness that the parent post is commenting about: It's still strange that it can auto-tune to a higher value than you can manually tune it to.

2. It's always valuable to provide further references, but I'd guess that down-voters found the "It's pretty clearly documented" phrasing a little condescending? Perhaps "See the docs at [] for more information."?

3. "Please don't comment about the voting on comments. It never does any good, and it makes boring reading."

> once you do SO_RCVBUF the auto-tuning is out of the picture for that socket

Oh I didn’t realize this. That explains the switch in limits. However:

I would have liked to keep auto-tuning, but only change the max buffer size. It’s still weird to me that these are different modes with different limits and whatnot. In my case, I was parallelizing tcp and capping the max size would have been better, and instead varying the number of conns.

I gave up on it. Especially since I need cross platform user-space only, I don’t want to fiddle with these APIs that are all different and unpredictable. I guess it’s for the best anyway, to avoid as much per-platform hacks as possible.

> It's pretty clearly documented @ Documentation/networking/ip-sysctl.rst

I guess I need to step up my doc grepping game, cause it was quite hard to even find this on Google. I ran my own experiments to verify.

> Edit: downvotes, really? smh

Fwiw not me.

And to add: the kernel autotunes better than you can, so leave that enabled unless you're Vint Cert, Jim Gettys, or Vern Paxton.
Changed my name, thanks for the tip!