Hacker News new | ask | show | jobs
by sharpneli 3836 days ago
GPU's don't do any OoO processing like modern CPU's do. They also don't do any register renaming. They execute things really literally, up to the point where one has to manually put delay slots for pipelined stuff if one really writes the raw asm (Which the manufacturers tend to keep really hidden, in order to avoid the binary compatibility trap, see https://github.com/NervanaSystems/maxas as an example for third party assembler for nvidia Maxwell arch)

On GPU's the binary compatibility issue is solved by having the driver compile the shader/compute kernel before it's used. As an example nvidia uses PTX (see http://docs.nvidia.com/cuda/parallel-thread-execution/) as an intermediate language in CUDA which is then compiled by the runtime into the actual ASM.

On modern CPU's the register renaming has already decoupled the physical registers from the instruction set register. As an example modern haswell has over 100 registers per core.