| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ashvardanian 745 days ago
	Have you seen anyone productively using TMA on Nvidia or async instructions on AMD? I’m currently looking at a 60% throughput degradation for 2D inputs on H100: https://github.com/ashvardanian/scaling-democracy/blob/a8092...