Hacker News new | ask | show | jobs
by casercaramel144 864 days ago
I'm sorry, I don't understand the exact contribution here? There's many tutorials on how to train a language model. If it's a repository of SOTA techniques for training, this will be outdated in at max 3 months, and anyways the ground shifts under you in this field so you might as well read Arxiv all day if your intention is to keep up with SOTA.
2 comments

It looks like this team gave us everything we need to reproduce their models, the actual artifacts needed to reproduce it. As far as I can tell, they share the data and every step along the way to final model...not just describing what they did.
researchers don't read tutorials, they cross check each other's work. You need details to do that.
wdym by cross check each others work? Surely just reporting the final loss is good enough if that's the intention. The final end goal is lower loss anyways so it's not even a bad metric.