|
|
|
|
|
by peaslock
1277 days ago
|
|
Not necessarily: https://arxiv.org/abs/2206.14486 Also, even with "Chinchilla laws", you still gain performance in a larger model, you just need a lot more data (if just as noisy) to reach the same level of convergence, but a larger model will have already partially converged to a superior model with the same amount data. |
|