Hacker News new | ask | show | jobs
by SimplyUnknown 2059 days ago
One of the advantages of using the CPU rather than GPU for inference (especially with batch size 1) is that it doesn't need data transfer from host to device, which is a notoriously slow, asynchronous process. This could also explain the difference in total run time, if measured correctly.
1 comments

Especially since WebGL doesn't have mapped buffers[0]. There's no way to do asynchronous texture (aka data) uploads. At best, you can read back asynchronously but even that's not guaranteed by the spec[1]. Async data transfer gives much higher throughput for sending data and retrieving results.

This is especially painful on mobile where GPU and CPU memory are the same physical RAM, and the "map buffer" operation corresponds to an actual instruction to the memory controller rather than synchronizing memory across PCIe lanes.

[0]: https://www.khronos.org/registry/webgl/specs/latest/2.0/#5.1... [1]: https://www.khronos.org/registry/webgl/specs/latest/2.0/#3.7... - Note the "non-normative" block describing the potential to bypass the specified blocking behavior for getBufferSubData.