Hacker News new | ask | show | jobs
by gpm 322 days ago
The link __s posted is really more the level where you need to be aware of the sqrt(n) cost, as you scale from one device to another (L1 cache -> L2 cache -> L3 cache -> ram -> SSD -> SSD in another computer in the same datacenter -> SSD in another computer in the same continent...)

As far as I know (and I may just be ignorant), ignoring the very important part of the equation that are caches, it doesn't really matter what row/column you address in RAM, at that level things are dictated by clock speeds.

Caches are obviously very important though, and beyond optimizing the probability of cache hits, on modern CPUs some cores are "closer" than others and thus cache-interactions between them are faster: https://github.com/nviennot/core-to-core-latency And if you optimize based on this you can get speedups, both at the macro level by doing things like scheduling things that talk on "close" cores, and at the micro level by doing things like implementing NUMA aware locking primitives: https://dl.acm.org/doi/10.1145/3477132.3483557

There's also definitely been CPUs (not sure if this is still a thing) where some cores share memory channels and some cores don't so you can access RAM faster (higher bandwidth) if you spread your access between the two sets of cores instead of staying within one.