| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lostmsu 511 days ago

This was a simplification. Of course you need some extra VRAM I/O based on your KV cache size.

But assuming your KV cache size is << model size, that simplification is pretty accurate.

You can just scroll to the first chart they have that explains the idea.