|
|
|
|
|
by zone411
1277 days ago
|
|
I've actually seen this paper before, but I don't think it's helpful. If the entire GitHub is 100B tokens and your prune it down properly, then fine, you can get equal performance with fewer tokes. However, if you want improved performance, you still need more data, not just a larger model size, and that's hard to obtain. I don't think it's a lost cause and we will be be stuck with current performance by any means though - there are other ways to go. |
|
Not true. See figure 2: https://arxiv.org/pdf/2203.15556.pdf#page=5
The loss decreases with greater model size at the same compute budget (i.e. stopping sooner regarding training data). Also some rehearsal/multi-epoch training improves the forgetting rate (thereby improving performance substantially), which hasn't been taken into account by Chinchilla et al. because they train <1 epoch.
https://arxiv.org/abs/2205.12393