|
|
|
|
|
by ComputerGuru
701 days ago
|
|
It’s a common strategy for small tasks where the overhead of dispatching the task greatly exceeds the computation of it. It’s also a better way to maximize L1/L2 cache hit rates by improving memory locality. Eg you have 100M rows and you want to cluster them by a distance function (naively), running dist(arr[i], arr[j]) is crazy fast, the problem is just that you have so many of them. It is faster to run it on one core than dispatch it from one queue to multiple cores, but best to assign the work ahead of time to n cores and have them crunch the numbers. |
|
Also on current CPUs it'll be affected by hyperthreading and launch 28 threads, which would probably work out pretty well overall.