Hacker News new | ask | show | jobs
by spion 4076 days ago
Not sure if it was intentional, but the article is quite misleading.

The thread pool in node is only used for a limited number of APIs. Pretty much all networking uses native async IO and is unaffected by the size of the thread pool. Things like Oracle's driver are rare exceptions: the typical MySQL/PostgreSQL/redis etc drivers all use native async IO and are unaffected by this.

The author only glosses over this briefly. As a result this article leaves the impression that the problem described is the norm, which is not the case.

4 comments

It's a completely unscientific method, but searching through one of our large applications (`npm ls|wc -l` -> ~2000 dependencies), the only modules I can find using `uv_queue_work` are:

* kerberos, unused (dependency of mongodb)

* protobuf, for serializing data

* snappy, for compression

kerberos isn't actually used in our app, so it doesn't matter, but we send a lot of data through protobuf and snappy, so it may be worth us profiling this a little more.

You can also experiment with different values for the env variable `UV_THREADPOOL_SIZE`; last time I checked this can even be set from the JS code via `process.env['UV_THREADPOOL_SIZE']` if you make sure to do it before you call something that uses the threadpool.
There's a whole section of the article covering which parts of node may be affected. If the article began by saying it only affects FS and DNS ops, and some drivers, people may be more tempted to stop reading.

Is there any reliable way to check if the libs you're using are subject to this issue?

There is a pretty reliable way to make sure they don't: if they don't install any native modules and aren't filesystem or DNS related, they're not affected. If they do, you may grep the native module's source code for uv_queue_work but I don't know if that will catch everything.
Aren't there some commonly used system calls that don't have asynchronous equivalents, such as open() or stat() or access()? Are threads used for those?
> Pretty much all networking uses native async IO and is unaffected by the size of the thread pool

Are they still using threads to get the "magic" working? I'm referring to this sentence: "But how did that happen? To the best of my knowledge node.js, is not powered by magic and fairy dust and things don’t just get done on their own."

Not magic, but APIs like these: https://en.wikipedia.org/wiki/Epoll https://en.wikipedia.org/wiki/Kqueue

Pretty much all event loop based programs work the same way: instead of blocking on a single request for IO, they use system calls (e.g. epool_wait) that block until any of the many descriptors (sockets) has some event (data to be read, client connecting, etc). It gets a bit complicated when there are queued tasks for the thread pool and timers involved too, but its the same principle.

Don't most computer have separate processors for network, hard drive, etc. So, even if you have a single core processor, you are still running a multi-processor environment? Can anyone give me details on this? Someone told me something like this once and it has confused me ever since...
Yes. There are 100s of sub units that are complete processors on your motherboard. DSP, Ethernet controllers, Disk IO controllers, sound controllers, memory controllers.. and thats not counting the programmable controllers in every disk drive, sd card, reader, USB hub, peripheral, etc.
Does this have an impact on writing concurrent software?