| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 81 days ago
	The basic argument is that its KV cache is roughly an order of magnitude more compact than previous Chinese models, which were already very compact compared to the likes of Gemma 4 (though that example is a bit of an extreme). If you pair this with the basic facts of how to maximize LLM inference performance at scale (this was recently talked about in a video lecture on the Dwarkesh Patel YouTube podcast) the case for doing slow batched inference on prem with DeepSeek V4, perhaps even with memory offload, becomes, as I see it, quite obvious. Of course, I'd like to be proven wrong!

1 comments

gghh 81 days ago

Right, Dwarkesh's episode with Reiner Pope. Didn't watch the full video but as soon I saw both going to an old school blackboard with an actual chalk in hand I could tell they meant business hehe :) Thanks for recommending the vid and for the info about DS V4.

link