Hacker News new | ask | show | jobs
MegaScale: Scaling Large Language Model Training to More Than 10k GPUs [pdf] (usenix.org)
1 points by yankcrime 591 days ago