| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aoanevdus 547 days ago
	I don’t get it. The labs have regularly made improvements that dramatically lower the cost of training an equal-performing model. When they do this, they also train a larger model with even higher performance. This time, DeepSeek did the first part but didn’t do the second. Now every lab in the world will throw their compute into the effort to replicate and beat DeepSeek’s model with larger scale. It’s not like everyone is just going to say “well I guess AI is smart enough now, no point improving it anymore!” and stop building bigger training clusters. If anything, r1 makes even more GPU demand likely, since it mitigated or at least delayed the risk AI hit a dead end (in which case, ceasing development may actually make sense).

1 comments

gleenn 546 days ago

Define dramatically with numbers. From all the sources I've read, it was so significant and also run on a far more limited cluster and the results are as good as the other frontier models. Optimizations have been coming, I think the one or more they found were significantly larger.

link