| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wenyuanyu 583 days ago
	Truly remarkable! Their approach to distributed inference is on an entirely new level. For the prefill stage, they utilized a deployment unit comprising 32 H800 GPUs, while the decoding stage scaled up to 320!! H800 GPUs per unit. Incorporates a multitude of sophisticated parallelization and communication overlap techniques, setting a standard that’s rarely seen in other setups. [0] https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSee...