| HN Mirror

Keep in mind that 5XX error codes are for server errors and 4XX codes are for client errors. Returning 429 would imply "too many connections (from your computer)", not total for the service. Choosing to return a 503 for over-taxed servers is, as far as I can tell, done maybe half the time. Depending on the kind of service you're running, you might want to enforce a server timeout that says "after a certain number of milliseconds of local response time, return a 5XX error code and abandon the connection". That would be a particular component in an overall strategy for handling high load, which would heavily bias towards serving the easiest responses first. That may or may not be a good idea: what if the "expensive" requests are from paying customers accessing account pages, and the "inexpensive" ones are from a sudden spike in traffic to your homepage due to some good press somewhere? Of course eventually, you'd want to separate these two kinds of traffic entirely, such that customers are only affected by outages that they create. You can then focus on expanding your capacity to handle customers directly, instead of trying to lump that in with the much more unpredictable behavior of general web traffic.

> Just putting some check for number of open files before line that opens file and setting response code to 500 and 429 before opening file?

So actually this is one of the big benefits of putting the semaphore limiting file access within its own dedicated coroutine (except on the server side instead of the client). It allows you to handle the connection without having to deal with immediate responses. What that means in practice is that your server will be slower to respond under high load, but until it hits the client's (browser) timeout limit, you'll still be able to respond. It actually doesn't require any extra code to do that. Note that this isn't the only way to achieve this result, but it's probably the most direct, and simplest, especially given the approach you've taken with the code thus far.

A load balancer sits on top of that, ideally monitoring metrics like server CPU usage, memory load, or (most directly) request response time, and then shifts around requests between servers accordingly, to minimize the delay incurred in the aforementioned "wait for semaphore (or other synchronization primitive)" part.

At the end of the day, until you start hitting the limit of concurrent connections that others have mentioned, you don't really actually need to worry very much about how many connections you have open at once. You just want to focus on handling every connection you have as quickly as possible.