Hacker News new | ask | show | jobs
by donavanm 4544 days ago
Instead of netstat(8) or ss(8) check out /proc/net/sockstat and /proc/net/netstat and /proc/net/tcp. Might as well save a fork and some context switches.

  net.ipv4.tcp_rmem=8192 873800 8388608
  net.ipv4.tcp_wmem=4096 655360 8388608
  net.ipv4.tcp_mem=8388608 8388608 8388608
You may want to rethink this. Your default values would support initial send and receive windows of 400 & 600 packets. Ive ever seen initial windows that high in the wild. If it's a client you've seen recently they should be in the peer cache already. With this default receive allocation you only get 39,000 sockets mx. And once you exceed tcp_mem high your sockets will be force closed with a RST sent to the other side. Much better to have 'pressure' kick in and limit the buffers, throttling the send & receive windows.

Go look at 'mem' in sockstat. I'd guess your average utilization is more in the 50kB range. And that includes both send and receive and the tcp_info structs, IIRC.

  net.ipv4.tcp_max_orphans=262144
That seems incredibly high, Id expect more in the ~5,000 range on a very busy host. Check your 'orphans' from sockstat.

  net.core.netdev_max_backlog = 16384
From the source comments this is actually a per CPU packet backlog, havent verified the implementation though.

  net.ipv4.tcp_max_tw_buckets=6000000
You may not need to do this. sysctl_max_tw_buckets limits the number of entries in the TIME_WAIT queue. When a socket moves to TIME_WAIT and the list is full it will instead go directly to CLOSE. Not very polite, and its possible you fail to retrans data, but IMHO a low risk scenario. See what level youre actually running at in sockstat.

  tcp_tw_recycle
The worst sysctl name ever. The useful part is setting the TIME_WAIT timer to socket RTO instead of TCP_TIMEWAIT_LEN (60 seconds). The terrible behavior is in tcp_v4_conn_request() of tcp_ipv4.c. The sysctl also enables strict timestamp & sequence checking on SYNs. If peers behind behind a NAT device have clocks > 1 second apart their SYNs will be silently dropped. IIRC PawsPassive from /proc/net/netstat will be incremented for each drop.

  tcp_tw_reuse
See tcp_twsk_unique() in tcp_ipv4.c. IIRC when you request a new ephemeral socket it's checked against the timewait socket list. If sysctl_tcp_tw_reuse is set and the TIME_WAIT socket is older than one second it can be reused. Normally TIME_WAIT sockets are aged out of the queue after TCP_TIMEWAIT_LEN, ~60 seconds.

On TIME_WAIT in general you should probably look in to setting the Maximum Segment Lifetime to a more reasonable value than 60s. You want to cover your max client RTO + one or two retrans. IMO something like 10s may be too short, but I can not imagine 30 not working splendidly. See TCP_TIMEWAIT_LEN & TCP_PAWS_MSL & wherever else I'm missing the header values.

1 comments

Great info, thanks! going to look into these