Hacker News new | ask | show | jobs
by physicsguy 1473 days ago
Can you not do this using the CPU affinity environment variables and just ignoring the efficiency codes? I was under the impression you could bind to specific cores with:

GOMP_CPU_AFFINITY=“1 2 5 6”

With thread 1 bound to core 1, thread 2 on core 2, thread 3 on core 5, thread 4 on core 6. I don’t have an M1 to play around on but I’d have assumed that the cores are fixed IDs.

Aside from that, if the workload is predictable in time, using a more complex scheduling pattern might help. You could perhaps look at how METIS partitions the workload, but see if it’s modifiable by adding weights to the cores reflective of their relative performance. Generally, to get good OMP performance I always found it better to treat it almost like it’s not shared memory, because on HPC clusters, you have NUMA anyway which drags performance down once you have more threads than a single processor has cores in the machine

1 comments

Unfortunately, the thread affinity api on m1 doesn't work that way, at least based on what I've been able to understand by reading here: https://developer.apple.com/forums/thread/703361 and more specifically this linked source file: https://github.com/apple-oss-distributions/xnu/blob/bb611c8f...

I agree with your other points though!