Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition

Y	Hacker News new \| ask \| show \| jobs

	Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition (jeffreywong20.github.io)
	1 points by thw20 37 days ago