Hacker News new | ask | show | jobs
by chii 687 days ago
There's some speculation that there are higher horizons to the training, as explained in this video: https://www.youtube.com/watch?v=Nvb_4Jj5kBo

the term for it is "grokking", amusingly. There's some indication that we are actually undertraining by 10x

1 comments

I've seen improvement numbers up to 12x, but after that the returns are so diminishing that there's not really a point. 12x on training costs I mean, probably still won't get AGI.