|
|
|
|
|
by credit_guy
823 days ago
|
|
> these efforts take serious resources Meta just published their new optimization results [1]. According to them > training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks.
In this context a GPU is an NVIDIA A100, which you can buy, if you can buy, for $10000.And this is after an explosion of ideas that lead to unthinkable optimizations just two years ago. If someone did train such a model 2 years ago, it would have cost hundreds of millions. Now it's 5 million. Maybe in 2 years it's going to be only $50k. Should you start a startup now and invest $5 million, an risk someone stealing the show for pennies in 2 years? If you do, I really can't see if you can afford to open source the results of your training. [1] training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks. |
|