|
|
|
|
|
by knlb
236 days ago
|
|
Thanks for the post, this is pretty cool! I feel like I've seen Cupti have fairly high overhead depending on the cuda version, but I'm not very confident -- did you happen to benchmark different workloads with cupti on/off? --- If you're taking feature requests: a way to subscribe to -- and get tracebacks for -- cuda context creation would be very useful; I've definitely been surprised by finding processes on the wrong gpu and being easily able to figure out where they came from would be great. I did a hack by using LD_PRELOAD to subscribe/publish the event, but never really followed through on getting the python stack trace. |
|