Hacker News new | ask | show | jobs
by ashvardanian 698 days ago
Have you seen anyone productively using TMA on Nvidia or async instructions on AMD? I’m currently looking at a 60% throughput degradation for 2D inputs on H100: https://github.com/ashvardanian/scaling-democracy/blob/a8092...