Hacker News new | ask | show | jobs
by JonChesterfield 554 days ago
The "proper syscall" isn't a fast thing either. The context switch blows out your caches. Part of why I like the name syscall is it's an indication to not put it on the fast path.

The implementation behind this puts a lot of emphasis on performance, though the protocol was heavilt simplfied in upstreaming. Running on pcie instead of the APU systems makes things rather laggy too. Design is roughly a mashup of io_uring and occam, made much more annoying by the GPU scheduler constraints.

The two authors of this thing probably count as people who program GPUs for a living for what it's worth.