|
|
|
|
|
by chapplap
2256 days ago
|
|
That's not exactly true once you have to deal with vector extensions like AVX-512. It's quite a pain to write by hand (C intrinsics) and many of the ways to abstract it away end up giving you a GPU-like programming model (eg. Intel ISPC). Plus, this has largely been tried before with Xeon Phi and it didn't end so well. Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code. |
|
There are a lot of tasks that a GPU can do faster than a CPU but it would require batching a large amount of work before you can gain a speedup. EYPC CPUs do not suffer that limitation. If all you have is an array with 4 elements you can straight up run the vector instructions and then immediately switch back to scalar code. Meanwhile with a GPU you probably need at least an array with 10000 elements or more.