| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pjc50 441 days ago

> A CPU with 4 cores is going to have the capacity of executing 4 seconds of CPU-time per second. It does not matter how much “background idle threading” you do or don’t. The CPU doesn’t care. You always have 4 seconds of CPU-time per second. That’s an important concept to understand.

> If you write a program in the design of Node.js — isolating a portion of the problem, pinning it to 1 thread on one CPU core, letting it access an isolated portion of RAM with no data sharing, then you have a design that is making as optimal use of CPU-time as possible

This is .. not true as written? You get one second of CPU time per second, not four. Now, it may be quite hard to reach your full four seconds of CPU time per second, usually because of RAM bandwidth issues despite all the caching, and a hyperthreading fake "core" absolutely does not count the same as a separate die core, but the difference is real.

Author does have a point that slicing the work too small has significant overheads. But they've overstated it.

And this is before we get into the real source of parallel FLOPS, the GPU.

(edit: note that there may also be thermal issues and CPU frequency scaling going on; it is usually impossible to run all cores of a modern CPU at their max rated frequency for more than a very short time! But if you've bought a 64-core Ryzen and are only using one core, there's a huge gap there which you're not using)

1 comments

UncleEntity 441 days ago

> Author does have a point that slicing the work too small has significant overheads. But they've overstated it.

Exactly.

I was messing around with adding multi-threading to this 3d thing and it slowed it down for the smaller cases up until it overcame the overhead then it sped things up. It was using OpenMP and only a couple shared loop variables so probably not as drastic as whatever node does but it did slow the common case down enough to be not worth the effort.

The author of TFA needs to go run any renderer in single and multi-thread mode then report back to the class.

link

pjc50 441 days ago

> The author of TFA needs to go run any renderer in single and multi-thread mode then report back to the class.

Indeed. The whole of modern graphics API architecture hinges on the idea that each of your million or so pixels is a meaningful unit of work that can be done in parallel.

link

gpderetta 441 days ago

I still think that the argument is flawed beyond trivially parallel problems, but my understanding is that the author is arguing for shared-nothing, not for single threaded/single-process solutions.

link