| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rcarmo 506 days ago

I like a lot of what Dario writes, but in this case I just can't follow the reasoning. Everything I've picked up about how DeepSeek did what they did (including going a level lower than CUDA to better take advantage of the limited hardware[1], and the balance of techniques used[2]) points to some very smart Chinese engineers having out-smarted US ones (to put it in terms that matter to US folk, because I'm European and I ordinarily wouldn't care):

    1 - https://stratechery.com/2025/deepseek-faq/
    2 - https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture

This post comes across as overly defensive of the US export controls and leaning on the authoritarian regime angle far too much to feel like it isn't just a way to shore up interest in US-based AI companies and widen the moat (or just make sure someone else shores it up politically while they catch up technically).

Anyway, this will always be a deep pocket race. But I wish it wasn't so much about brute-forcing GPUs and wasting power to (as yet) uncertain outcomes as far as model capabilities are concerned, and to me what DeepSeek achieved was to point out ingenuity and better techniques should be something that both OpenAI and Anthropic ought to be pursuing instead of burning cash.

3 comments

kristjansson 506 days ago

To be sure, DeepSeek did great work, and this is a bit aside from TFA. But the PTX thing is a bit of meme? What do we think torch.compile and triton and llvm's nvptx backend are doing under the hood? The warp-specialization thing quoted in [1] cites to a _2014_ paper[2] out of Stanford ...

[2]: https://dl.acm.org/doi/10.1145/2555243.2555258

link

rcarmo 506 days ago

Yeah, well, it's not _just_ PTX. Think about what you would do if you had to work in a resource-constrained system (that's a mindset I closely relate to since I still do C++ for MCUs, and it makes you dig _under_ the libraries to save resources).

link

kristjansson 506 days ago

Totally, they did great work under their constraints. Training in FP8, the MLA thing they introduce in DeepSeek-V2, etc. I just take particular issue with the attention the PTX thing is getting because (a) it's not like other labs don't do stuff like that and (b) it doesn't contribute nearly as much to their outcome as the other algorithmic and operational improvements they've made.

link

highfrequency 506 days ago

He is basically saying, with his inside knowledge of Anthropic's current capabilities: "they did for $5m what we could probably do for $10m or $15m if we launched the training run today without any new optimizations." So on the one hand that's very impressive, both because the cost effectiveness is significantly higher and because even replicating SoTA outside of OpenAI/Anthropic is very difficult. On the other hand, it's not too surprising that a company that needs to economize on compute will find ways to do so; neither Anthropic nor OpenAI would consider it worthwhile to have their best researchers prioritize cutting down on training costs or compute requirements; they have near infinite capital and are focused on breakthroughs in making their best models as good as possible. I don't think it's accurate to say that Deepseek "outsmarted US engineers"; they had a very different objective function than Deepseek, so they pushed much harder on the engineering optimizations for better cost performance.

Everyone seems to rag on OpenAI/Anthropic for spending so much money and take it as a symptom of capitalist waste, but this reality seems great to me - massive amounts of money is basically being funneled from VCs toward progress in machine learning. Once expensive breakthroughs are made, it is only a matter of a few years until people make the engineering optimizations to make those breakthroughs cheap.

Just want to emphasize the progress in cost that Dario highlights: fixed AI capabilities are becoming 4x cheaper every year. That is absolutely insane. US GDP growth averaged 3-4% over the last 250 years and look how far that has taken us. Moore's Law averaged ~40% annual growth in transistor density and look how far that has taken us in just 65 years. 4x growth is AI capability/cost per year is absolutely insane.

link

Kostchei 506 days ago

yeh. I see Dario saying "let's protect the US more" for no reason other than bias and "of course they improved over time" which feels like a mighty strong strain of copium. Very disappointing for a leader of an organization i respected. Assuming he speaks for Anthropic, and it seems he does, Past tense.

link

rcarmo 506 days ago

The "good" news is that neither Anthropic nor OpenAI are based in Europe, so I don't have to feel like they're burning my taxpayer money.

The "bad" news is that so far I've yet to see a serious contender emerging from that part of the world (although the Middle East has invested heavily in LLMs due to obvious cultural differences).

link

GoatInGrey 506 days ago

He writes with the assumption that the reader understands the danger in China developing a lead in military technology. Hence his very explicit wording:

"To be clear, the goal here is not to deny China or any other authoritarian country the immense benefits in science, medicine, quality of life, etc. that come from very powerful AI systems. Everyone should be able to benefit from AI. The goal is to prevent them from gaining military dominance."

link

sitkack 506 days ago

You can hear it directly from Dario

Navigating a world in transition: Dario Amodei in conversation with Zanny Minton Beddoes https://www.youtube.com/watch?v=uvMolVW_2v0

You can tell as an academic he wants to give DeepSeek props, at least that is what I would like to believe. The first third has the presenter leaning into "cheap chinese" too many times.

link