Hacker News new | ask | show | jobs
by cardine 1260 days ago
Offloading is when the computation is done on the CPU instead of the GPU. DeepSpeed is an example of this.
2 comments

In case of offloading, the computations are usually still performed on GPU, but the model is hosted in RAM/SSD instead of the GPU memory (and its chunks are copied to the GPU memory when necessary).
A lot of computation is offloaded to the CPU, such as gradients and optimizer states. You are right though that quite a bit of computation is still done on the GPU.
I remember when GPUs were starting to support arbitrary computation and offloading meant shifting work away from the CPU.