| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Solvency 893 days ago
	If you think we've hit peak you're grossly underestimating the sheer volume of copyrighted books, manuscripts, screenplays, podcasts, movies, documents, history, and research papers that ChatGPT hasn't been trained on. There's a LOT more juice to squeeze still.

1 comments

nirvael 893 days ago

This is actually incorrect, there's not that much data left to train on. I remember reading an article about it, might have been one of Gwern's or something about Chinchilla scaling, but to produce an order of magnitude increase we need an order of magnitude more data and there just isn't that amount available.

link

nirvael 893 days ago

Found the reference (see section 2): https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla...

link