| CUDA is an important part of the story. I think the industry is moving to 'MLIR' solution (Yes, there is a Google project called exactly that, but I am referring to the general idea here), where the network is defined and trained in one place, then the weights are exported, delegated to optimized runtime to be executed. If such trend furthers down, then there will be very little reason to replace Python as the glue layer here. Instead it will become that everything is training in Python -> exported to shared format -> executed in optimized runtime, kind of flow. Rust's opportunity could be to replace C++ in this case. But do mind that this is also a competitive business, where the computation is pushed further down to the hardware implementations, like TPUv1 and T4 chips etc. |
Also there are a few weird things with the Python workflow, like dealing with Python2 vs Python3, and generally having to run everything in the context of a virtual environment, and the fact that it doesn't seem to really have a single "right way" to deal with dependency management. It's a bit like the story with imports in Javascript: those problems were never really solved, and the popular solutions are just sort of bolted on to the language.
It's amazing how many great tools there are available in Python, but it does sometimes seem like it's an under-powered tool which has been hacked to serve bigger problems than it was intended for.