Hacker News new | ask | show | jobs
by volta87 1974 days ago
Depends on model size, but if the model is small enough that I actually do training on a PCIe board, I do. I partition an A100 in 8, and train 8 models at a time, or just use MPS on a V100 board. The bigger A100 boards can fit multiple of the same models that do fit in a single V100..

Also I tend to do this initially, when I am exploring the hyperparameter space, for which I tend to use smaller but more models.

I find that using big models initially is just a waste of time. You want to try many things as quickly as possible.