|
|
|
|
|
by fancyfredbot
1515 days ago
|
|
I wonder if this applies to the same extent for an on-package GPU which shares the same physical memory as the CPU. I'd expect round trip times in that case to be minimal and the available processing power would probably be competitive with AVX512. I have been wondering if this is the reason for deprecating AVX512 on consumer processors - these are likely to have a GPU available. |
|
I personally believe it may be possible to reduce latency using techniques similar to io_uring, but it may not be simple. Likely a major reason for the roundtrips is so that a trusted process (part of the GPU driver) can validate inputs from untrusted user code before it's presented to the GPU hardware.