Hacker News new | ask | show | jobs
by Palmik 500 days ago
> All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.

Says the CEO whose product [1] costs 15-50x times more. (This is not just the DeepSeek's API, but also 3p providers hosting the same model)

> DeepSeek does not "do for $6M5 what cost US AI companies billions". I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train

Ok, that's still at least 3-10x cost reduction (assuming "a few $10M" lowerbound of $20M). And for a model that he later implies is 2x larger than Sonnet. So that's 6-10x efficiency improvement. Nice!

> Since DeepSeek-V3 is worse than those US frontier models — let’s say by ~2x on the scaling curve.

What curve? Does he mean the simplistic performance / model params curve? That does not take into account that DeepSeek v3 is a MoE (can't compare MoE and dense param # in a naive way), nor the other architecture changes (KV compression, etc.).

Also, if Sonnet 3.5 is 2x smaller, then why is inference 15-50x more expensive than DeepSeek v3's? Does Anthropic not have good GPU engineers? Are they just running at insanely high margins? As a consumer I don't care how big your model is behind the scenes. I care about API costs or inference efficiency when hosting the model myself.

[1] Product that is mostly comparable and in some ways quite ahead.

1 comments

Where does he imply that it's 2x larger than Sonnet?
I inferred that from the other statement:

> Since DeepSeek-V3 is worse than those US frontier models — let’s say by ~2x on the scaling curve, which I think is quite generous to DeepSeek-V3

He says that it's 2x worse. So if it has the ~same quality [1], it would imply it's 2x larger. Unless I misunderstood what he meant there, of course.

[1]: "DeepSeek produced a model close to the performance of US models" -- in his own words.

Ah, I read that as him saying the quality isn't as good-equivalent to a model with 50% less compute. But you might be right.