Hacker News new | ask | show | jobs
by floatngupstream 1189 days ago
There is an enormous investment beside the training side. Once you have your model, you still need to run it. This is where Triton, TensorRT, and handcrafted CUDA kernels as plugins come in. There is no equivalent on ROCm for this (MIGraphX is not close).
1 comments

Models are re-trained periodically (months, weeks, even days), and new architectures/implementations come all the time. If a better algorithm appears, practitioners will adopt a new platform (e.g. Transformers for NLP models), so many systems can already plug-in new tools. GPUs are very expensive so there is also a strong incentive to make this little effort.
Yes, but this just makes a frictionless runtime for inference even more important (which is something that does not exist in a comparable form for AMD).