|
|
|
|
|
by modeless
3880 days ago
|
|
This is wrong. Training data can be streamed through GPU memory during training. It's your parameters that can't exceed GPU memory. You can get GPUs with 12 GB of memory, and they also support float16 so they can be twice as memory efficient as CPUs. If your model has more parameters than that, then you'll be waiting months or years for a single model to train using CPUs, even distributed. Furthermore, almost any technique you use to distribute and scale training will work just as well regardless of whether the computations are happening on CPUs or GPUs. |
|