| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by FL33TW00D 483 days ago

It seems to me that MLA will become the standard from here on out.

If Deepseek R1 had used standard MHA, they would need 1749KB per token for KV cache storage. This means that once the conversation reaches ~46,000 tokens, the KV cache will have exceeded the entire storage capacity of a single H100.

Using MLA, each token now consumes 125KB. This means you can hit ~640,000 tokens (2x Ulysses) before overflowing.