Hacker News new | ask | show | jobs
by sliken 3257 days ago
Er, when I say "memory limited" you say I'm wrong because it's latency limited not bandwidth limited. I think we are violently agreeing. Latency limited is just one specific form of memory limited.

In my testing (to my surprise) it turns out that throughput keeps increasing at up to 2 times the number of memory channels. So with 8 memory channels throughput keeps increasing at up to 16 threads, which upon reflection makes sense. Generally it takes 25-40ns to miss through L1, L2, and L3 -> memory controller. So with 16 misses and 8 channels you end up with all 8 channels busy, and 8 more misses queued and waiting in the memory controller. So your throughput approximately doubles from just 8 threads.

In any case, I agree that single thread performance isn't improved by multiple channels and that latency limited workloads get a small fraction of the potential memory bandwidth.