|
|
|
|
|
by rcarmo
506 days ago
|
|
I like a lot of what Dario writes, but in this case I just can't follow the reasoning. Everything I've picked up about how DeepSeek did what they did (including going a level lower than CUDA to better take advantage of the limited hardware[1], and the balance of techniques used[2]) points to some very smart Chinese engineers having out-smarted US ones (to put it in terms that matter to US folk, because I'm European and I ordinarily wouldn't care): 1 - https://stratechery.com/2025/deepseek-faq/
2 - https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
This post comes across as overly defensive of the US export controls and leaning on the authoritarian regime angle far too much to feel like it isn't just a way to shore up interest in US-based AI companies and widen the moat (or just make sure someone else shores it up politically while they catch up technically).Anyway, this will always be a deep pocket race. But I wish it wasn't so much about brute-forcing GPUs and wasting power to (as yet) uncertain outcomes as far as model capabilities are concerned, and to me what DeepSeek achieved was to point out ingenuity and better techniques should be something that both OpenAI and Anthropic ought to be pursuing instead of burning cash. |
|
[2]: https://dl.acm.org/doi/10.1145/2555243.2555258