|
|
|
|
|
by drusepth
2247 days ago
|
|
According to the "Get the code" link [1], it looks like these models need pretty huge GPUs to even interact with the pre-trained models. Is that abnormal? I was under the impression that training the model is generally what takes the beefy GPU, and then using that model requires more consumer-adjacent hardware. A P100 GPU is $3000 [2]. [1] https://parl.ai/projects/blender/ [2] https://www.amazon.com/dp/B06WV7HFWV/ |
|
2.7bn parameters (for the smaller model) means you have to do 2.7bn calculations for a single step of the model. You could fit the model in main memory, but how long is it going to take you to run all those calculations on a CPU? And the full model will need to run multiple times to output a single sentence.