| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thundergolfer 486 days ago
	“Pure garage-energy” is a great phrase. Most interested to see their inference stack, hope that’s one of the 5. I think most people are running R1 on a single H200 node but Deepseek had much lower RAM per GPU for their inference and so had some cluster based MoE deployment.

3 comments

mmoskal 486 days ago

Their tech report says one inference deployment is around 400 GPUs...

link

fspeech 485 days ago

You need that to optimize load balancing. Unfortunately that gain is not available to small or individual deployment.

link

sva_ 486 days ago

I don't think the RAM size of the H800 was nerfed (80GB), but rather the memory bandwidth between gpus.

But yeah, would be interesting to see how they optimized for that.

link

NitpickLawyer 486 days ago

Correct. There are 3 main ways to "gimp" high end GPUs meant for training - "cores", "on-chip memory speed" and "interconnects". IIUC the H800 had the first 2 unchanged but halved the interconnect speeds.

H20 is the next iteration of the "sanctions" that I believe also limited the "cores" but left the on-chip memory intact, or slightly higher (from the new generation).

link

golly_ned 486 days ago

“Pure garage-energy” with 10,000 A100s, apparently. I’d love to have a garage like that.

link

blackeyeblitzar 486 days ago

From https://semianalysis.com/2025/01/31/deepseek-debates/

> We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months.

link

golly_ned 485 days ago

The paper in the repo says: “ For DL training, we deployed the Fire-Flyer 2 with 10,000 PCIe A100 GPUs“

link

1oooqooq 486 days ago

that report is lazy. they assume all GPUs owned (openly reported) by the parent company (a hedge fund which claims to use those GPUs to generate trades) were used by the invested company.

that's as dumb as saying coca cola have acccess to all offices of Berkshire Hathaway.

likewise, all comments praising deepseek history are also misleading as the company barely exists for a year.

everything is opaque marketing being repeated. just drop the off topic bla bla bla and focus on the facts and code in front of you.

thanks for coming to my ted talk.

[flagged]

Hey, could you please make your points without resorting to the flamewar style? You've done that repeatedly in this thread, as well as in other threads recently (e.g. https://news.ycombinator.com/item?id=43035040). This is not what HN is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful. The basic idea is to make your substantive points thoughtfully, regardless of how wrong anyone else is or you feel they are.

link

hereonout2 486 days ago

Didn't the deepseek paper itself state they trained on 2048 H200s?

Claiming they have access to 5x this amount is not such a bold claim?

link

brookst 486 days ago

Appeals to authority are so totally unconvincing.

What claims from the semianalysis article do you think are false? And based on what evidence?

link

maxglute 486 days ago

Parent Highflyer hedgefund only been around for a few years with 8B AUM, aka their single digit % management fees since founding is in low 100s millions total (for all operating expenses), hence fiscally cannot acquire 1B+ of just hardware capex. Deepseek having access to that much hardware doesn't pass basic smell test, and semi analysis has been dodging call outs on socials for this basic math illiteracy.

link