Hacker News new | ask | show | jobs
Scaling Pedagogical Pre-Training: From Optimal Mixing to 10B Tokens (huggingface.co)
2 points by codelion 105 days ago