| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ComputerGuru 703 days ago
	> If you have a program that can use more than 8 cores, then that 8P+12E CPU should approach a 14P CPU in speed Only if you use work stealing queues or (this is ridiculously unlikely) run multithreaded algorithms that are aware of the different performance and split the work unevenly to compensate.

3 comments

Dylan16807 703 days ago

Or if you use a single queue... which I would expect to be the default.

Blindly dividing work units across cores sounds like a terrible strategy for a general program that's sharing those cores with who-knows-what.

link

ComputerGuru 702 days ago

It’s a common strategy for small tasks where the overhead of dispatching the task greatly exceeds the computation of it. It’s also a better way to maximize L1/L2 cache hit rates by improving memory locality.

Eg you have 100M rows and you want to cluster them by a distance function (naively), running dist(arr[i], arr[j]) is crazy fast, the problem is just that you have so many of them. It is faster to run it on one core than dispatch it from one queue to multiple cores, but best to assign the work ahead of time to n cores and have them crunch the numbers.

link

Dylan16807 702 days ago

It has always been a bad idea to dispatch so naively and dispatch to the same number of threads as you have cores. What if a couple cores are busy, and you spend almost twice as much time as you need waiting for the calculation to finish? I don't know how much software does that, and most of it can be easily fixed to dispatch half a million rows at a time and get better performance on all computers.

Also on current CPUs it'll be affected by hyperthreading and launch 28 threads, which would probably work out pretty well overall.

link

chmod775 702 days ago

> What if a couple cores are busy

If you don't pin them to cores, the OS is still free to assign threads to cores as it pleases. Assuming the scheduler is somewhat fair, threads will progress at roughly the same rate.

link

Dylan16807 702 days ago

I would not assume it's sufficiently fair to make that a good algorithm.

Even a small bias could turn a 5 minute calculation into a 6 or 7 minute calculation as the stragglers finish up.

link

Sohcahtoa82 702 days ago

> run multithreaded algorithms that are aware of the different performance and split the work unevenly to compensate.

This is what the Intel Thread Director [0] solves.

For high-intensity workloads, it will prioritize assigning them to P-cores.

[0] https://www.intel.com/content/www/us/en/support/articles/000...

link

ComputerGuru 702 days ago

Then you no longer have 14 cores in this example, but only len(P) cores. Also most code written in the wild isn’t going to use an architecture-specific library for this.

link

dwattttt 702 days ago

The P cores being presented as two logical cores and E cores presented as a single logical core results in this kind of split already.

link