|
|
|
|
|
by kir-gadjello
1195 days ago
|
|
"Accomodate" is the word to scrutinize here. Yes, it will cost a lot to outright buy physical HPC infrastructure to train and infer a series of large models deployed for customers all over the globe. No, it won't cost nearly as much to rent cloud infra to train a similarly-sized model. No, you won't be able to train a large model on a single multi-GPU node, you will need a cluster containing a respectable power of two of GPUs (or other accelerators). It's a widely known meme at this point, but to reiterate: For a popular large model, the largest part of the cost will be spent on inference, not on training. If we assume inference on end user device, this cost disappears. And even if you have the million to rent a cluster, there is a very deep question of the optimal architecture, dataset and hyperparameters to train the best model possible under given constraints. |
|