Hacker News new | ask | show | jobs
by sanjams 511 days ago
> Infrastructure algorithm optimization

> Novel training frameworks

Where can one find more information about these? I keep seeing hand-wavy language like this w.r.t. DeepSeek’s innovation

3 comments

I think you haven't been looking too hard in that case. Here is the R1 paper: https://arxiv.org/abs/2501.12948

You can find more papers from the attached author: https://arxiv.org/search/cs?searchtype=author&query=DeepSeek... or title https://arxiv.org/search/?query=DeepSeek&searchtype=title&ab... and go through citations for more.

Of course, you could just search by some of the attached authors as well. Daya Guo, the lead author for the R1 paper has 36 papers on Arxiv: https://arxiv.org/search/cs?query=Guo%2C+Daya&searchtype=aut...

Besides the papers, DeepSeek has an active Github https://github.com/deepseek-ai and https://huggingface.co/deepseek-ai

I have read the R1 paper. My observation is that there is no information whatsoever about how they are overcoming the limitations of the H800 compared to the H100 which is what the parent article is about. That's the piece Im curious about.

I will concede that I have not read all their papers or looked through their code, but that's why I asked the question: I hoped someone here might be able to point me to specific places in specific papers instead of a axvix search.

Give Section 3 of the DeepSeek-V3 paper a read. The discuss their HAI-LLM framework and have a pretty in-depth description of their DualPipe algorithm and how it compares to other pipeline bubbles. They also describe how they work around NVLink limits and tons of other optimizations in extreme depth. The section is 10 pages long, and it's relatively dense, not fluff!
Their paper goes into the details: https://arxiv.org/abs/2501.12948
They wrote a paper. As far as I can tell they applied a smørrebrødsbord approach and that let to the results they got.
FWIW, I think you meant "Smörgåsbord", which is basically tapas but Swedish-style, like a mix of many different dishes. Smørrebrød is a Danish type of sandwich, I'm guessing smørrebrødsbord would be "a table of Smørrebrød", but I'm not sure how common "smørrebrødsbord", I'm not Danish :)