| HN Mirror

Not just that: you can perfectly happily poll a marker you inserted into the CUDA stream, interspersed with sched_yield() syscalls to let other processes get work done in between you checking if the GPU got to a point where you can retrieve (as/if relevant) results and submit new work. You would have to dial the scheduler time slice to not keep those other processes running long enough after you yielded for your queue of submitted work to run dry before you get to top that queue off. This isn't as critical when you can completely fill the scheduler queue (I remember ~1000 entries, but it's been years and I haven't checked again if I even remembered correctly. Don't rely on this!), as you may want to force sleep there for some millisecond(s) to keep the CPU core sleeping instead of merely allowing other processes to get work done.