|
|
|
|
|
by neilmovva
1213 days ago
|
|
While OPT-175B is great to have publicly available, it needs a lot more training to achieve good results. Meta trained OPT on 180B tokens, compared to 300B that GPT-3 saw. And the Chinchilla scaling laws suggest that almost 4T tokens would be required to get the most bang for compute buck. And on top of that, there are some questions on the quality of open source data (The Pile) vs OpenAI’s proprietary dataset, which they seem to have spent a lot of effort cleaning. So: open source models are probably data-constrained, in both quantity and quality. |
|