Hacker News new | ask | show | jobs
by credit_guy 823 days ago
> these efforts take serious resources

Meta just published their new optimization results [1]. According to them

  > training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks.
In this context a GPU is an NVIDIA A100, which you can buy, if you can buy, for $10000.

And this is after an explosion of ideas that lead to unthinkable optimizations just two years ago.

If someone did train such a model 2 years ago, it would have cost hundreds of millions. Now it's 5 million. Maybe in 2 years it's going to be only $50k. Should you start a startup now and invest $5 million, an risk someone stealing the show for pennies in 2 years? If you do, I really can't see if you can afford to open source the results of your training.

[1] training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks.