Hacker News new | ask | show | jobs
by sliken 3264 days ago
There exists cache friendly applications that see zero to minimal change with more bandwidth or more channels.

There also exists cache unfriendly applications that see large changes with more bandwidth or more channels.

Games generally are cache friendly, many easy benchmarks are cache friendly. But generally more aggressive use of a machine (which is presumably why you buy a top spec CPU) is generally less cache friendly. Also people notice worst case performance much more than average or best case. Audio skipping, user interface lag, etc.

You can see this effect in action when you compare single thread performance to multithead performance using every CPU. L1 caches are generally note shared, so if it's less than N times faster for N CPUs you are seeing software overhead (the cost of synchronization) or cache misses (in L1, L2, or L3) or of course main memory bottlenecks.

I've seen plenty of cases on older servers where running on all CPUs of single socket was FASTER than all CPUs of two sockets, but that's much less common these days because each socket has it's own memory system.

I can assure you that the entire server market and high end desktop market isn't running 2 to 8 time the memory bandwidth just for fun. The bandwidth is expensive and justified.

1 comments

An application being cache-unfriendly doesn't imply that it will be bandwidth-bound. If the application reads single words from random locations it will be cache-unfriendly and latency-bound. If it reads 1K contiguous bytes from random locations it will be cache-unfriendly and possibly bandwidth-bound. If it scans the entire memory space sufficiently quickly it may be both cache-friendly and still bandwidth-bound.

I can't speak for the server market, but I'm certain that the high-end desktop market is composed primarily of people who do run top-of-the-line specs just for fun.

Correct, an application that reads single words from random locations will be cache unfriendly and latency bound. However additional memory channels means you can run more of them and get better throughput.

Personally I bought more cores when I can and find that the average and best case are very similar to CPUs with less cores, but the worst case performance is much better. With 8 CPUs I find that the browser, plex, processing batches of photos, transcoding video, running a minecraft server and other random duties have much less of an impact on normal desktop use.

It used to be MUCH easier to be I/O bound with spinning disks, but with the new M.2 SSDs some pretty impressive I/O rates are possible (random or sequential), which makes it easier to be CPU limited.