|
> I dont understand this - arent almost all ML NN models built in pytorch, and arent these compiled / jit'd into a lower level format - and can we not have various backends/drivers for that, such as CUDA / ROCM / vnni ? PyTorch already does. But if you're saying "NN" and "pytorch" that already means you're outside of the audience for CUDA I'm talking about in the article. My own stuff was usually Bayesian Hierarchical Models, which at least at the time made pytorch completely useless (that was nearly a decade ago though—maybe that specific use case improved). If you've tried to write actually new (or different enough) NNs or entirely different models, pytorch is too high-level, and sometimes even TF is too. Even aside from that, if you're a maintainer of BLAS or some specific library for sparse MM with very specific distributions that are optimized for it... Anyway, those are the key cases, but even aside from that, if you've ever tried even with some higher-level libraries to do non-vanilla stuff, nothing works as well as it should. You get random, inscrutable errors that certainly do exist on NVIDIA GPUs/stuff-based-on-CUDA-under-the-hood, but way way fewer. For newer, custom stuff, getting things like numerical overflows or other completely breaking problems on alternative backends, but don't happen / work just fine on CPU or CUDA backend is not really that uncommon. Or the CUDA backend is just ridiculously faster. If you're doing something annoying, new, and complicated enough, there's no point in taking the aggravation. The people who write the stuff that is used in PyTorch or other libraries definitely write CUDA code (in C++ etc). And then the people who use PyTorch just build on top of that. I deliberately tried to keep it accessible and have non-technical (or just non-software) audiences also be able to get an intuition for why CUDA has such strong lock-in. Otherwise, the pushback I've often gotten "just re-write it" or "it's just software" which if it were so simple, people wouldn't need to be yelling so much at AMD across so many comments. Basically, people who can't fathom why software technical debt can ever be a thing. Or, if it is, China has infinite money and time anyway. A high-level analysis should say that Huawei, AMD, and Intel all should easily invest enough to make this all work and compete with CUDA to push their hardware platforms. The reality is decentralized decision-making from users also makes it more of an expensive, uncertain bet that people will adopt. A bunch of the lower-level, underlying libraries that things are built on AND the researchers who do bleeding-edge research still have a huge amount of experience in and stuff built on CUDA. |