Hacker News new | ask | show | jobs
by sapphirecat 5407 days ago
The "speed limit" is on the outbound connections from Node to the service. Node calls connect(), and receives a new port exclusively for that one connection.

A server facing the Internet can serve lots of clients because those clients have plenty of IP+port combinations to go around on their end, to allow the server to tell the difference among them even though it only has the one IP+port on its end. But 60K node.js connections from one machine, the frontend server with a single IP address, to a single IP+port on the backend server, do not have that luxury. All that identifies the connection now is the port number on the Node server, so it must be unique per connection.

Connection pools attempt to mitigate the problem by inserting a manager (the pool) in the middle, to accept larger numbers of requests from Node and try to schedule them on a lower, sustainable number of connections to the backend. At least on an RDBMS, transactions require the app to have exclusive use of the connection for its request, so when all the real connections are scheduled out, new requests have to wait for an old connection to be relinquished.

EDIT: Going back to the blog post, it said, "You really need your back-end services to scale out with node.js." Which I think means, your back end service should have multiple IP addresses, to alleviate the bottleneck described above.

1 comments

I've added a picture to clarify which ports I'm talking about. Hopefully this clarifies things a bit.
Looks pretty nice. Some pictures are worth quite a few words.

Is this problem made worse by the ephemeral ports remaining unavailable after disconnect, because they're stuck in TIME_WAIT? Or does a modern TCP stack note a low RTT and release the port much sooner?

You have to think about concurrency. If there were a total of 64K simultaneous requests to that physical instance, each of which is running 100+ apps because it's multi-tenant, this drastically reduces the number of ports available to each app. With evented IO, a socket could be open for 250 ms (db query taking time) that sucks up a port causing a potential DoS on the other apps.