|
|
|
|
|
by duchenne
913 days ago
|
|
The most important paper to understand this issue is "Sacling Laws of Neural Language Models" by Open AI in 2020 [1].
Many consider it the most important paper that predicted the high performance of modern LLMs. This paper shows how the loss decreases when you increase the model size, compute, or training dataset size. From the article: > Convergence is inefficient: When working within a fixed compute budget C but without any other restrictions on the model size N or available data D, we attain optimal performance by training very large models and stopping significantly short of convergence. It clearly states that when you are limited by your training time compute, you should under-train your model. [1] https://arxiv.org/abs/2001.08361 |
|