| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by teleforce 672 days ago

Please check this HN post on the similar subject by Meta [1].

Previous paper by the same team from Meta and MIT but with Billions instead of Trillions of parameters [2].

[1] A RoCE network for distributed AI training at scale:

[2] Optimized Network Architectures for Training Large Language Models With Billions of Parameters [PDF]: