Hacker News new | ask | show | jobs
by DamonsJ 659 days ago
"If we have a CUDA kernel that continuously runs for 10 seconds but only uses 1 SM, on an H100, this would register 100% utilization, but the SM efficiency would be 1 / 132 = 0.7%."

does this situation register 100% utilization? BTW, the SM OCCUPANCY is also a metric you need to care about if you concern on kernel efficiency

2 comments

Yup, you'll see 100% utilization on a kernel over a time period if it's considered active, which includes just having a single thread executing [1]. SM occupancy is great but can be a little difficult to interpret since you're not simply trying to maximize it, unlike SM efficiency.

[1]: https://pytorch.org/blog/pytorch-profiler-1.9-released/#gpu-...

That's why I look mostly at the H100 temperatures. Gives a better utilization metric