|
|
|
|
|
by wenyuanyu
536 days ago
|
|
Truly remarkable! Their approach to distributed inference is on an entirely new level. For the prefill stage, they utilized a deployment unit comprising 32 H800 GPUs, while the decoding stage scaled up to 320!! H800 GPUs per unit. Incorporates a multitude of sophisticated parallelization and communication overlap techniques, setting a standard that’s rarely seen in other setups. [0] https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSee... |
|