| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kioku 135 days ago
	> Our key insight is to offload critical softmax primitives to idle tensor units, maximizing hardware utilization and throughput. > … speedups of 1.05–1.17×across diverse attention configurations on Ampere and Hopper GPUs …