| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andai 24 days ago
	What's the downside? Don't they stop when they hit diminishing returns?

2 comments

You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens

Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.