Hacker News new | ask | show | jobs
by grork 1561 days ago
I thought that to extract peak performance out of NUMA based systems, you had to get down-and-dirty with memory access & locality to ensure you don’t cross sockets for data thats stored in RAM attached to other CPUs.

Or am I out of date on NUMA systems?

2 comments

This is what they were referring to. To get optimum performance out of NUMA systems, you need to be careful about memory allocation and usage to maximize the proportion of your accesses that are local to the NUMA domain where the code is running. Apple's answer here is essentially "we made the link between NUMA domains have such high bandwidth, you don't even have to think about this."
The big dies these days (M1 included) have non-uniform memory access baked in because they distribute the memory caches. If you want maximum performance, you will certainly want to be aware of which "performance core" you're running in.