| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by somnial 52 days ago
	true, but no reason the predictor model couldn't use linear attention (i.e. mamba, GDN etc) to predict KV caches