|
|
|
|
|
by boxed
441 days ago
|
|
> If you write a program in the design of Node.js — isolating a portion of the problem, pinning it to 1 thread on one CPU core, letting it access an isolated portion of RAM with no data sharing, then you have a design that is making as optimal use of CPU-time as possible. It is how you optimize for NUMA systems and CPU cache locality. Even a SMP system is going to perform better if treated as NUMA. Well that's... just wrong. |
|
It is true that you can only squeeze 100% of the maximum possible useful compute out of a NUMA system with methods like the article author was suggesting. The less coordination there is between cores, the less cross-core or cross-socket communication is needed, all of which is overhead.
Caveat: If a bunch of independent processes are processing independent data, they'll increase cache thrashing at L2 and higher levels. Synchronised threads running the same code more-or-less in lockstep over the same areas of the data can benefit from sharing that independent processes can't. In some scenarios, this can be a huge speedup -- just ask a GPU programmer!
Where the process-per-core argument definitely stops being a good approach is when you start to consider latency.
Literally just this week, I need to help someone working on a Node.js app that needs to pre-cache a bunch of very expensive computations (map tiles over data changing on an interval).
Because this is CPU-heavy and Node.js is single-threaded, it kills the user experience while it is running. Interactive responses get interleaved with batch actions, and users complain.
This is not a problem with ASP.NET where this kind of work can simply run in a background thread and populate the cache without interfering with user queries!
For similar reasons, Redis replacements that use multi-threading have far lower tail latencies: https://microsoft.github.io/garnet/