Hacker News new | ask | show | jobs
by ogrisel 465 days ago
It appears that they reused a lot of the data preparation provided by the AllenAI team:

https://github.com/allenai/OLMoE

https://github.com/allenai/dolma

https://github.com/AMD-AIG-AIMA/Instella