Hacker News new | ask | show | jobs
by borzunov 1266 days ago
In case of offloading, the computations are usually still performed on GPU, but the model is hosted in RAM/SSD instead of the GPU memory (and its chunks are copied to the GPU memory when necessary).
1 comments

A lot of computation is offloaded to the CPU, such as gradients and optimizer states. You are right though that quite a bit of computation is still done on the GPU.