Y
Hacker News
new
|
ask
|
show
|
jobs
by
andai
24 days ago
What's the downside? Don't they stop when they hit diminishing returns?
2 comments
Ifkaluva
24 days ago
You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens
link
hgoel
23 days ago
Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.
link