Hacker News new | ask | show | jobs
by strictfp 1053 days ago
My tip for this is Node.js and some stream processing lib like Highland. You can get ridiculous IO parallelism with a very little code and a nice API.

Python just scales terribly, no matter if you use multi-process or not. Java can get pretty good perf, but you'll need some libs or quite a bit of code to get nonblocking IO sending working well, or you're going to eat huge amounts of resources for moderate returns.

Node really excels at this use case. You can saturate the lines pretty easily.

4 comments

0_o

Did I miss something? Does nodes/highland have good shared memory semantics these days?

I've always felt the best analogy to python concurrency was (node)js, but I admittedly haven't kept up all that well.

Wouldn't Elixir or Go be better for this use case? Node still blocks on compute heavy tasks.
I think they mentioned CPU intensive work, which I'm taking to imply that it's more CPU bound than I/O bound. So unless you're suggesting they use Node's web workers implementation for parallelism, the default single threaded async concurrency model probably won't serve them well.
Isn't Node single threaded, just like Python?
Python is technically multithreaded, but the GIL means only one thread can execute interpreter code at a time. If you use libraries written in C/C++, the library code can run in multiple threads simultaneously if they release the GIL.

I vaguely recall Node used to run multiple threads under the hood for disk I/O, but it might use kqueue/epoll these days.

Node is essentially a single-threaded API to a very capable multithreaded engine.

https://youtu.be/ztspvPYybIY