|
The short answer is no, but the long answer is that this is a very complex tradeoff space. Going forward, we may see more of these types of tasks moving to GPU, but for the moment it is generally not a good choice. The GPU is incredible at raw throughput, and this particular problem can actually implemented fairly straightforwardly (it's a stream compaction, which in turn can be expressed in terms of prefix sum). However, where the GPU absolutely falls down is when you want to interleave CPU and GPU computations. To give round numbers, the roundtrip latency is on the order of 100µs, and even aside from that, the memcpy back and forth between host and device memory might actually be slower than just solving the problem on the CPU. So you only win when the strings are very large, again using round numbers about a megabyte. Things change if you are able to pipeline a lot of useful computation on the GPU. This is an area of active research (including my own). Aaron Hsu has been doing groundbreaking work implementing an entire compiler on the GPU, and there's more recent work[1], implemented in Futhark, that suggests that that this approach is promising. I have a paper in the pipeline that includes an extraordinarily high performance (~12G elements/s) GPU implementation of the parentheses matching problem, which is the heart of parsing. If anyone would like to review a draft and provide comments, please add a comment to the GitHub issue[2] I'm using to track this. It's due very soon and I'm on a tight timeline to get all the measurements done, so actionable suggestions on how to improve the text would be most welcome. [1]: https://theses.liacs.nl/pdf/2020-2021-VoetterRobin.pdf [2]: https://github.com/raphlinus/raphlinus.github.io/issues/66#i... |
I can't help but notice that, at least in my experience on Windows, this is the same order of magnitude as for inter-process communication on the local machine. Tangent: That latency was my nemesis as a Windows screen reader developer; the platform accessibility APIs weren't well designed to take it into account. Windows 11 finally has a good solution for this problem (yes, I helped implement that while I was at Microsoft).