| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by highfrequency 548 days ago

He is basically saying, with his inside knowledge of Anthropic's current capabilities: "they did for $5m what we could probably do for $10m or $15m if we launched the training run today without any new optimizations." So on the one hand that's very impressive, both because the cost effectiveness is significantly higher and because even replicating SoTA outside of OpenAI/Anthropic is very difficult. On the other hand, it's not too surprising that a company that needs to economize on compute will find ways to do so; neither Anthropic nor OpenAI would consider it worthwhile to have their best researchers prioritize cutting down on training costs or compute requirements; they have near infinite capital and are focused on breakthroughs in making their best models as good as possible. I don't think it's accurate to say that Deepseek "outsmarted US engineers"; they had a very different objective function than Deepseek, so they pushed much harder on the engineering optimizations for better cost performance.

Everyone seems to rag on OpenAI/Anthropic for spending so much money and take it as a symptom of capitalist waste, but this reality seems great to me - massive amounts of money is basically being funneled from VCs toward progress in machine learning. Once expensive breakthroughs are made, it is only a matter of a few years until people make the engineering optimizations to make those breakthroughs cheap.

Just want to emphasize the progress in cost that Dario highlights: fixed AI capabilities are becoming 4x cheaper every year. That is absolutely insane. US GDP growth averaged 3-4% over the last 250 years and look how far that has taken us. Moore's Law averaged ~40% annual growth in transistor density and look how far that has taken us in just 65 years. 4x growth is AI capability/cost per year is absolutely insane.