That's certainly very nice, but I'm sure you can appreciate the complexity involved here. I would need to compile this and have CUDA properly configured on Linux, or have no CUDA support on Windows. So even your hand-picked example is not that different from the process I just described, which is the standard for ML projects as of today, even for players like Meta or Nvidia.
https://bellard.org/nncp/