|
|
|
|
|
by bob1029
572 days ago
|
|
I see a lot of "just use the GPU" and you'd often be right. SIMD on the CPU is most compelling to me due to the latency characteristics. You are nanoseconds away from the control flow. If the GPU needs some updated state regarding the outside world, it takes significantly longer to propagate this information. For most use cases, the GPU will win the trade off. But, there is a reason you don't hear much about systems like order matching engines using them. |
|
Maximizing performance on a CPU today requires all the steps in the above article, and the article is actually very well written with regards to the 'mindset' needed to tackle a problem such as this.
It's a great article for people aiming to maximize the performance on Intel or AMD systems.
------
CPUs have the memory capacity advantage and will continue to hold said advantage for the foreseeable future (despite NVidias NVLink and other techs to try to bridge the gap).
And CPU code remains far easier than learning CUDA, despite how hard these AVX intrinsics are in comparison to CUDA.