Hacker News new | ask | show | jobs
by blackeyeblitzar 791 days ago
> It looks like a mid-level implementations of training and inference

I’m not familiar with how any of this works but what does state of the art training look like? Almost no models release their training source code or data sets or pre processing or evaluation code. So is it known what the high level implementation even is?

1 comments

https://github.com/NVIDIA/Megatron-LM

This is probably a good baseline to start thinking about LLM training at scale.