| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mrinterweb 78 days ago
	If turboquant can reliably reduce LLM inference RAM requirements by 6x, suddenly reducing total RAM needs by 6x should have a dramatic shift on the hardware market, or at least we can all hope. I know 6x is the key-value cache saving, so I'm not sure if that really translates to 6x total RAM requirements decrease for inference.