Hacker News new | ask | show | jobs
by jdub 3513 days ago
I don't think there's much NUMA action on single socket at the moment, but as CPU area increases and more of the transistors are not actually doing CPU work (to spread out the heat-making bits) which increases distances on a single die, this will change.
2 comments

Unless there is core-specific RAM on the die, why? Isn't the essential aspect of NUMA the fact that there is some memory which is "near", and some which is "far"?
Yeah, as distance (latency) to RAM increases, the amount of on-die cache increases (another handy way to distribute heat with a performance bonus) and coherency becomes more costly, so in effect it becomes the core-specific RAM you mention.

(Oh, hi Kiko!)

Hey Jeff.. the nick fonts are really small on HN; I didn't see it was you!

I don't think latency to near RAM will increase; it would have too material a performance impact. Even in disaggregated designs like Rackscale there is definitely a concept of "near RAM", which is not cache, but which has very low latency.

However, your post made me realize that as the number of cores go up, as with KNL, they are likely to be organized hierarchically with some clustered sharing of cache, so indeed NUMA-style affinity of workload to core starts paying off there. IOW, if you have thousands of cores on a chip, they definitely aren't going to be all sharing the same L2 and L3.

Are core specific caches write through or write Back?
I think that they're always write-back, but it gets complicated with the different levels and it's different between Intel and AMD - my understanding is that Intel uses an 'inclusive' method where everything that's in a higher-level cache line will also be in a lower-level cache line; i.e. if something is in L1 it will always also be in L2 and L3, but then with AMD's exclusive scheme L1 or L2, when clearing a dirty line, can/will effectively 'write-around' L2 or L3 straight to a lower level.