Hacker News new | ask | show | jobs
by lhl 1139 days ago
Namespace collisions are inevitable, especially w/ how fast-moving the LLM space is right now, just wanted to point out that besides this "Open-Llama" project (which looks really interesting, and well documented in the Github repo), there is also another group training "OpenLLaMA" https://github.com/openlm-research/open_llama (which looks like an effort by two Berkeley PhD students, https://www.haoliu.site/ and http://young-geng.xyz/ to reproduce LLaMA using the 1.2T token Together RedPajama dataset. They've released up to a 300B checkpoint so far.)

Feedback for /u/bayes-song - it'd be great to have a more info on the model card on HF - right now it's unclear the parameter count, # of total tokens you're planning on training on/how many you've trained on so far. An Evaluation section (maybe using lm-evaluation-harness) might be good as well?

2 comments

To add to that, I believe the title of this submission is a reference ("Open-Lamam: A “real” open-source project to train LLM not just checkpoints") to this project you link, since they did not (to my knowledge), release the code for the training or detailed instructions to reproduce their experiment precisely, only checkpoints.
Thank you for your suggestion, this will indeed be more intuitive, I will add relevant results as soon as possible.