| > All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese. Says the CEO whose product [1] costs 15-50x times more. (This is not just the DeepSeek's API, but also 3p providers hosting the same model) > DeepSeek does not "do for $6M5 what cost US AI companies billions". I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train Ok, that's still at least 3-10x cost reduction (assuming "a few $10M" lowerbound of $20M). And for a model that he later implies is 2x larger than Sonnet. So that's 6-10x efficiency improvement. Nice! > Since DeepSeek-V3 is worse than those US frontier models — let’s say by ~2x on the scaling curve. What curve? Does he mean the simplistic performance / model params curve? That does not take into account that DeepSeek v3 is a MoE (can't compare MoE and dense param # in a naive way), nor the other architecture changes (KV compression, etc.). Also, if Sonnet 3.5 is 2x smaller, then why is inference 15-50x more expensive than DeepSeek v3's? Does Anthropic not have good GPU engineers? Are they just running at insanely high margins? As a consumer I don't care how big your model is behind the scenes. I care about API costs or inference efficiency when hosting the model myself. [1] Product that is mostly comparable and in some ways quite ahead. |