|
|
|
|
|
by volta87
1974 days ago
|
|
Depends on model size, but if the model is small enough that I actually do training on a PCIe board, I do. I partition an A100 in 8, and train 8 models at a time, or just use MPS on a V100 board. The bigger A100 boards can fit multiple of the same models that do fit in a single V100.. Also I tend to do this initially, when I am exploring the hyperparameter space, for which I tend to use smaller but more models. I find that using big models initially is just a waste of time. You want to try many things as quickly as possible. |
|