| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by luc4sdreyer 1061 days ago
	They claim 1.1x to 7x, depending on what you're doing. The 10% to 50% is for the ~10k GPU LLM training, where the main bottleneck tends to be networking: > DGX GH200 enables more efficient parallel mapping and alleviates the networking communication bottleneck. As a result, up to 1.5x faster training time can be achieved over a DGX H100-based solution for LLM training at scale.