|
|
|
|
|
by gregjm
973 days ago
|
|
> My so-called CPU “active” time is actually an inferred value; CUDA spins the CPU 100% constantly, even when the CPU is just waiting for the GPU The CUDA Runtime and Driver APIs allow you to use“blocking synchronization” where the CPU will go to sleep while waiting for synchronization with the device. However, it seems that PyTorch doesn’t expose this functionality in any of its Python APIs: https://github.com/pytorch/pytorch/issues/28224 What happens when you try using ctypes to call into libcudart.so to set the device flags as described in the above issue? You’ll have to call torch.cuda.init() for it to work, and unfortunately it won’t work if PyTorch is launching kernels from other threads. |
|