|
|
|
|
|
by dist-epoch
572 days ago
|
|
Intel code from 15 years ago also runs today. But it will not use AVX512. Which is the same with PTX, right? If you didn't use the tensor core instructions or wavefront voting in the CUDA code, the PTX generated from it will not either, and NVIDIA will not magically add those capabilities in when compiling to SASS. Maybe it remains competitive because the code is inherently parallel anyway, so it will naturally scale to fill the extra execution units of the GPU, which is where most of the improvement is generation to generation. While AVX code can't automatically scale to use the AVX512 units. |
|
In contrast, NVidia can go from 64-bit instruction bundles to 128-bit machine code (96-bit instruction + 32-bit control information) between Pascal (aka PTX Compute Capacity 5) and Voltage (aka PTX Compute Capacity 7) and all the old PTX code just autocompiles to the new assembly instruction format and takes advantage of all the new memory barriers added in Volta.
Having a PTX translation later is a MAJOR advantage for the NVidia workflow.