|
|
|
|
|
by SnowflakeOnIce
659 days ago
|
|
> you can get 100% GPU utilization by just reading/writing to memory while doing 0 computations Indeed! Utilization is a proxy for what you actually want (which is good use of available hardware). 100% GPU utilization doesn't actually indicate this. On the other hand, if you aren't getting 100% GPU utilization, you aren't making good use of the hardware. |
|
It is a correct metric when your block device has a single physical spinning disk that can only accept one request at a time (dispatch queue depth=1). But the moment you deal with SSDs (capable of highly concurrent NAND IO), SAN storage block devices striped over many physical disks or even a single spinning disk that can internally queue and reorder IOs for more efficient seeking, just hitting 100%util at the host block device level doesn't mean that you've hit some IOPS ceiling.
So, looks like the GPU "SM efficiency" analysis is somewhat like logging in to the storage array itself and checking how busy each physical disk (or at least each disk controller) inside that storage array is.