|
|
|
|
|
by jdub
3513 days ago
|
|
Yeah, as distance (latency) to RAM increases, the amount of on-die cache increases (another handy way to distribute heat with a performance bonus) and coherency becomes more costly, so in effect it becomes the core-specific RAM you mention. (Oh, hi Kiko!) |
|
I don't think latency to near RAM will increase; it would have too material a performance impact. Even in disaggregated designs like Rackscale there is definitely a concept of "near RAM", which is not cache, but which has very low latency.
However, your post made me realize that as the number of cores go up, as with KNL, they are likely to be organized hierarchically with some clustered sharing of cache, so indeed NUMA-style affinity of workload to core starts paying off there. IOW, if you have thousands of cores on a chip, they definitely aren't going to be all sharing the same L2 and L3.