|
|
|
|
|
by camel-cdr
35 days ago
|
|
greater then 512-bit SIMD isn't currently and in the near future relevant for regular general purpose processors. But for smaller more specialized CPUs in embedded or automotive usecases you can get more parallel compute, while keeping the software model simpler than having to dispatch to a GPU. Specifically a design like https://saturn-vectors.org/#_short_vector_execution, which like to use 2x or 4x wider vectors that the datapath length for more efficient chaining.
I quite like that design, because you can get high utilization and limited out-of-order execution without vector register renaming. |
|