|
|
|
|
|
by sandGorgon
196 days ago
|
|
deepseek kind of innovated on this using off-the-shelf components right ? to quote from their paper
"In order to ensure sufficient computational performance for DualPipe, we customize efficient
cross-node all-to-all communication kernels (including dispatching and combining) to conserve
the number of SMs dedicated to communication. The implementation of the kernels is codesigned with the MoE gating algorithm and the network topology of our cluster." |
|