Hacker News new | ask | show | jobs
by visarga 1043 days ago
but when the model trains on 13T tokens it is hard to be OOD