|
|
|
|
|
by adrian_b
35 days ago
|
|
Intel has been forced to reintroduce 512-bit vectors in the mainstream, because of the competition from AMD. Starting with the Intel Nova Lake CPUs, around the end of this year, all future AMD and Intel CPUs will provide 512-bit vectors, like also the current AMD Zen 5 and Zen 4 CPUs. The 512-bit vector length is more convenient than other lengths, because on the AMD and Intel CPUs it coincides with the length of a cache line. Because of this, it is easier to optimize simultaneously for the best cache usage. For GPUs, which favor throughput over latency, 1024-bit and 2048-bit vector register widths are frequently used. For CPUs it is unlikely that widths greater than 512-bit would be useful, as the vector operations that should be done on CPUs are those for which the high latency of using a GPU is undesirable. |
|
But for smaller more specialized CPUs in embedded or automotive usecases you can get more parallel compute, while keeping the software model simpler than having to dispatch to a GPU.
Specifically a design like https://saturn-vectors.org/#_short_vector_execution, which like to use 2x or 4x wider vectors that the datapath length for more efficient chaining. I quite like that design, because you can get high utilization and limited out-of-order execution without vector register renaming.