Y
Hacker News
new
|
ask
|
show
|
jobs
by
kioku
135 days ago
> Our key insight is to offload critical softmax primitives to idle tensor units, maximizing hardware utilization and throughput.
> … speedups of 1.05–1.17×across diverse attention configurations on Ampere and Hopper GPUs …