| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 69 days ago
	Shouldn't FlashAttention address the quadratic increase in memory footprint wrt. fine-tuning/training? I'm also pretty sure that it does not apply to pure inference due to how KV-caching works.