|
|
|
|
|
by williamtrask
265 days ago
|
|
(OP) the scaling laws / bitter lesson would disagree, but I tend to agree with you with some hedging. If you get copies of the same data, it doesn't help. In a similar fashion, going from 100 TBs of data scraped from the internet to 200TBs of data scraped from the internet... does it tell you much more? Unclear. But there are large categories of data which aren't represented at all in LLMs. Most of the world's data just isn't on the internet. AI for Health is perhaps the most obvious example. |
|
I have to note that taking the "bitter lesson" position as a claim that more data will result in better LLMs is a wild misinterpretation (or perhaps a "telephone version) of the original bitter lesson article, which say only that general, scalable algorithms do better than knowledge-carrying, problem-specific algorithms. And the last I heard it was the "scaling hypothesis" that hardly had consensus among those in the field.