Y
Hacker News
new
|
ask
|
show
|
jobs
by
ashvardanian
698 days ago
Have you seen anyone productively using TMA on Nvidia or async instructions on AMD? I’m currently looking at a 60% throughput degradation for 2D inputs on H100:
https://github.com/ashvardanian/scaling-democracy/blob/a8092...