| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by artemisart 603 days ago
	Important precision: the async part is absolutely not python specific, but comes from CUDA, indeed for performance, and you will have to use cuda events too in C++ to properly time it. For ONNX the runtimes I know of are synchronous as we don't do each operation individually but whole models at once, there is no need for async, the timings should be correct.

1 comments

godelski 603 days ago

Yes, it isn't python, it is... hardware. Not even CUDA specific. It is about memory moving around and optimization (remember, even the CPUs do speculative execution). I say a little more in the larger comment.

I'm less concerned about the CPU baseline and more concerned about the NPU timing. Especially given the other issues

link