|
|
|
|
|
by bitmasher9
14 days ago
|
|
The pace of data creation is only increasing, and our capabilities of sharing and storing it is growing as well. Lots of this is out in the open, ready for anyone to crawl and scrape. There probably is a point of “peak data” where the amount of new data will start decreasing, but that’s likely a 22nd or 24rd century problem. |
|
Unless we’re producing data on the order of an entire new internet every couple of years, then it’s hard to see how LLMs can achieve further huge leaps in capability compared to training on effectively 0% of the internet vs 100% of the internet.