Hacker News new | ask | show | jobs
by wenyuanyu 536 days ago
Truly remarkable! Their approach to distributed inference is on an entirely new level. For the prefill stage, they utilized a deployment unit comprising 32 H800 GPUs, while the decoding stage scaled up to 320!! H800 GPUs per unit. Incorporates a multitude of sophisticated parallelization and communication overlap techniques, setting a standard that’s rarely seen in other setups.

[0] https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSee...