|
|
|
|
|
by lostdog
2078 days ago
|
|
Yes (though the details are private). All the deep learning libraries are Python wrappers around C/C++ (which then call into CUDA). If you call the C++ layers directly, you have control over the memory operations applied to your data. The biggest wins come from reducing the number of copies, reducing the number of transfers between CPU and GPU memory, and speeding up operations by moving them from the CPU to the GPU (or vice versa). This is basically what the article does, but if you want to squeeze out all the performance, the Python layer is still an abstraction that gets in the way of directly choosing what happens to the memory. |
|
All because nobody has really provided off the shelf usable deployment libraries. That Bazel stuff if you want to use the C++ API? Big nope. Way too cumbersome. You're trying to move from Python to C++ and they want you to install ... Java? WTF?
Also, some of the best neural net research out there has you run "./run_inference.sh" or some other abomination of a Jupyter notebook instead of an installable, deployable library. To their credit, good neural net engineers aren't expected to be good software engineers, but I'm just pointing out that there's a big gap between good neural nets and deployable neural nets.