| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by snake_doc 547 days ago

Without taking a position on unipolar vs. multi-polar:

Dario makes an astounding implicit assumption:

- China originating labs cannot acquire chips providing 80-90% similar utility without the US within the next 2-3 years.

I'll make an observation, re: DeepSeek's incentives that drove them to create the innovations from the V2 and V3 papers.

DeepSeek, compared to American AI labs, are much more compute constrained, but in a unique way. Their chips are more memory bandwidth constrained (depending on type anywhere from 50% to 80% less bandwidth).

Therefore, each dollar/hour of investment towards memory optimization is worth MORE to DeepSeek than to American labs.

In the V2/3 paper, they've demonstrated exactly that with these memory optimization techniques.

1. MLA -> reduces KV cache by nearly 80% compared to GQA. By the way, this was published in V2 in May 2024.

2. FP8 matmul (while still accumlating in FP32 gradients) without losing significant quality.

3. DualPipe scheduling and reworking of Hopper SM's allocation on communication vs. computation -> DeepSeek's V3 paper has 2 full pages of hardware suggestions for "hardware designers" (read NVIDIA)

Export controls in a global market create different incentives in parties. The resulting incentives will change, and agents (using it as an traditional economics term) will change their capital allocation strategy.