Hacker News new | ask | show | jobs
by tbarbugli 672 days ago
Nice article which matches my experience when it comes to optimizing for performance: Linux defaults are never good defaults and you don't need webscale or anything before you get bitten by them.

To make a few examples: on many distributions you get 1024 as the file limits, 4KB of shared memory (shmall) and Nagle's algorithm is enabled by default.

Another thing that we noticed at work (shameless plug to getstream.io) when it comes to tail latency for APIs / HTTP services:

- TLS over HTTP is annoyingly slow (too many roundtrips)

- Having edge nodes / POPs close to end-users greatly improves tail latency (and reduces latency related errors). This works incredibly well for simple relays (the "weak" link has lower latency)

- QUIC is awesome

3 comments

> 1024 as the file limits

https://0pointer.net/blog/file-descriptor-limits.html is a good overview of the unfortunate reason why this is and how it should be handled.

That fails to note that `FD_SETSIZE` only applies if you're statically allocating your sets and rely on the libc defaults. If you do dynamic allocation (or, on some libcs, if you define the macro before including headers), you can select() a million FDs just fine.

It's still a bad idea for performance reasons (though `poll` can actually be worse on dense sets), but it's not actually the open-file limit that's the problem.

I don't think it fails to address that in a way that matters. The fact is that the default is still 1024. If you have bumped the side at build time it dynamically allocate them then you are more than welcome to bump the soft limit to the hard limit at the start of your program. (And set it back to the default before you exec anything else)
> QUIC is awesome

It would've been more awesome if it supported BBR for congestion control. QUIC gains in practice can be annihilated just by not having BBR implemented in the protocol, so sometimes QUIC could be even slower than HTTP2 over TCP (if TCP is properly configured).

In my experience QUIC is worse than TCP in a heterogeneous environment when optimizing for throughput. The added CPU usage from user space packet switching is a big factor for battery powered devices and congestion control lives its own life which often means it doesn’t get its fair share of bandwidth in the presence of TCP traffic.

I think QUIC can be awesome, and I hope it will. But I wouldn’t say we’re there yet. Low level kernel-adjacent things takes time. Networks are extremely heterogeneous and weird. Maybe in 5-10 years.

In the short-medium term, I think we could get much more bang for the buck if there was an easy way to improve the defaults on Linux and/or its distros.

FYI quic is compatible with bbr, and at least the google and msft quic implementations have bbr (albeit not by default afaik).
That honestly just makes me think: Is there a distro where this isn't the case? Where the defaults are set up for performance in a modern server context, with the expectation that the system will be admin'd by someone technical who knows the tradeoffs? Heck, the decisions + tradeoffs can all be documented in docs.

Is there a reason I'm missing why this wouldn't be worth jumping on?