|
|
|
|
|
by lhl
513 days ago
|
|
Give Section 3 of the DeepSeek-V3 paper a read. The discuss their HAI-LLM framework and have a pretty in-depth description of their DualPipe algorithm and how it compares to other pipeline bubbles. They also describe how they work around NVLink limits and tons of other optimizations in extreme depth. The section is 10 pages long, and it's relatively dense, not fluff! |
|