| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwdbaaway 77 days ago
	Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik_llama.cpp? If not, then it will be a welcomed addition for hybrid CPU/GPU inference.