|
|
|
|
|
by E6300
3260 days ago
|
|
The only situation where I imagine that could happen is if you need to apply a small number of instructions to a massive data set that's fully loaded in memory. What sort of application are you running? If you can say, obviously. |
|
Two channel memory systems assuming DDR4@2400 can do about 40GB/sec. Thread ripper is about double that (of the skylake-x, but NOT the kabylake-x). The new skylake xeons are 6 channel (about 120GB/sec) and the new AMD Epyc is 8 channel (about 160GB/sec).
Assuming a perfectly sequential access pattern and something simple like a=b+c (which reads 16 bytes and writes 8 bytes) you can run 1.6 billion of those a second.
So to not be memory bound you need to run an extra 15 times more instructions... without adding any cache misses, just to execute one instruction per cycle (a fraction of the possible). If it's less you are memory bound.
Now imagine it's not perfectly sequential, and instead you have to retrieve something from memory before you know where to go next. Like say a database index, binary tree, or linked list. Instead of getting 8 bytes @ 2400 Mhz you get 8 bytes per 70 ns. Keep in mind that's 8 bytes per 1/2.4 ns vs 70 or 168 times worse.
Suddenly instead of needing 15 times more instructions you need 2500 instructions per memory load, all without a extra cache miss.
So as you can see it can be quite easy to be memory limited. Sure some things do an amazing amount of calculations on very little data. But many things are data intensive, which justifies the large ram and large memory bandwidth machines that make up pretty much all servers shipped today. Memory bandwidth is expensive (CPU package, pins, sockets, motherboard traces, additional motherboard layers, power, etc), but well justified in many cases.