Hacker News new | ask | show | jobs
by Solvency 893 days ago
If you think we've hit peak you're grossly underestimating the sheer volume of copyrighted books, manuscripts, screenplays, podcasts, movies, documents, history, and research papers that ChatGPT hasn't been trained on. There's a LOT more juice to squeeze still.
1 comments

This is actually incorrect, there's not that much data left to train on. I remember reading an article about it, might have been one of Gwern's or something about Chinchilla scaling, but to produce an order of magnitude increase we need an order of magnitude more data and there just isn't that amount available.