|
|
|
|
|
by Const-me
1484 days ago
|
|
My desktop PC has Ryzen 7 5700G, on paper it can do 486 GFlops FP64 (8 cores at 3.8 GHz base frequency, two 4-wide FMAs every cycle). However, that would require 2TB/sec memory bandwidth, while the actual figure is 51 GB/second of that bandwidth. For large computational tasks where the source data doesn’t fit in caches, the CPU can only achieve a small fraction of the theoretical peak performance ‘coz bottlenecked by memory. The memory in graphics cards is an order of magnitude faster, my current one has 480 GB/sec of that bandwidth. For this reason, even gaming GPUs can be much faster than CPUs on some workloads, despite the theoretical peak FP64 GFlops number is about the same. |
|
Nevertheless, many of the problems of this kind require more memory than the 8 GB or 16 GB that are available on cheap GPUs, so the CPUs remain better for those.
On the other hand, there are a lot of problems whose time-consuming part can be reduced to multiplications of dense matrices. During the solution of all such problems, the CPUs will reach a large fraction of their maximum computational speed, regardless whether the operands fit in the caches or not (when they do not fit, the operations can be decomposed into sub-operations on cache-sized blocks, and in such algorithms the cache lines are reused enough times so that the time used for transfers does not matter).