|
|
|
|
|
by artemisart
603 days ago
|
|
Important precision: the async part is absolutely not python specific, but comes from CUDA, indeed for performance, and you will have to use cuda events too in C++ to properly time it. For ONNX the runtimes I know of are synchronous as we don't do each operation individually but whole models at once, there is no need for async, the timings should be correct. |
|
I'm less concerned about the CPU baseline and more concerned about the NPU timing. Especially given the other issues