| HN Mirror

With PCIe 4 and 5 the latency issues are not as much a problem as they were, what with latency masking, gpudirect/storage-direct, busy-loop kernels (and hopefully soon scheduling libraries to make them easier to use) :-) and if you're really into real-time, computing time on NVIDIA GPUs has excellent jitter/stability and they are used in the very tight control loop of adaptive-optics (1ms-loop with mechanical actuators to drive).

The penalty for branching has reduced in the last years, but yeah it's still heavy, but if you're OK with a bit of wasted compute, you can do some 'speculative' execution and do both branches in different warps, use only one result...

But yes, you're still using an accelerator.