Hacker News new | ask | show | jobs
by bee_rider 973 days ago
I’m somewhat confused as to what is exposed, as the description in the quote sounds like a blocking call, but with a busy wait, which seems like it couldn’t be the only or main thing that PyTorch exposes.
2 comments

Not just that: you can perfectly happily poll a marker you inserted into the CUDA stream, interspersed with sched_yield() syscalls to let other processes get work done in between you checking if the GPU got to a point where you can retrieve (as/if relevant) results and submit new work. You would have to dial the scheduler time slice to not keep those other processes running long enough after you yielded for your queue of submitted work to run dry before you get to top that queue off. This isn't as critical when you can completely fill the scheduler queue (I remember ~1000 entries, but it's been years and I haven't checked again if I even remembered correctly. Don't rely on this!), as you may want to force sleep there for some millisecond(s) to keep the CPU core sleeping instead of merely allowing other processes to get work done.
That is indeed the only API that it exposes.