|
|
|
|
|
by littlestymaar
88 days ago
|
|
> Data efficiency matters because compute grows much faster than data
[2] (referencing a paper from 2022) I'm not convinced this is particularly true in today's world, if you have more compute, you can simply generate more, and higher quality, artificial data. That's what all labs have been doing since at least 2023. Also, the post references the Chinchilla-optimal training as a comparison baseline, but everyone has moved far beyond Chinchilla scaling, small models are routinely trained on 10-400 times more data than (1-40T tokens) than the Chinchilla-optimal number, so the entire industry went the complete opposite of what they are proposing. That doesn't mean the techniques presented here are useless or anything (I'm not qualified to judge) but you should take the introduction with a grain of salt. |
|
For "expensive" data, it makes a lot of sense to use every trick in the book to squeeze that data for all its worth.