Hacker News new | ask | show | jobs
by arilotter 558 days ago
This specific model is only trained on 100 billion tokens, so it's not SOTA by any means, but we've got designs on larger training runs later :)