Hacker News new | ask | show | jobs
by rahimnathwani 2245 days ago
These are very big models, like 100x to 300x the # parameters of resnet-50.

2.7bn parameters (for the smaller model) means you have to do 2.7bn calculations for a single step of the model. You could fit the model in main memory, but how long is it going to take you to run all those calculations on a CPU? And the full model will need to run multiple times to output a single sentence.