|
While impressive number of images today. I believe this will be an underwhelming amount of images compared to what models are trained on in the future. This is an incomplete analogy but from the time a baby is born that baby will have seen 1,892,160,000 frames of data per eye 3,784,320,000 frames in a year. That baby practically knows nothing about the world still. |
I will copy paste the main findings from the article here:
- Data, not size, is the currently active constraint on language modeling performance. Current returns to additional data are immense, and current returns to additional model size are miniscule; indeed, most recent landmark models are wastefully big.
- If we can leverage enough data, there is no reason to train ~500B param models, much less 1T or larger models.
- If we have to train models at these large sizes, it will mean we have encountered a barrier to exploitation of data scaling, which would be a great loss relative to what would otherwise be possible.
- The literature is extremely unclear on how much text data is actually available for training. We may be "running out" of general-domain data, but the literature is too vague to know one way or the other.
- The entire available quantity of data in highly specialized domains like code is woefully tiny, compared to the gains that would be possible if much more such data were available.
[0] https://www.alignmentforum.org/posts/6Fpvch8RR29qLEWNH/chinc...