|
> "Basically nobody writes CUDA," wrote Keller in a follow-up post. "If you do write CUDA, it is probably not fast. […] There is a good reason there is Triton, Tensor RT, Neon, and Mojo." > Even Nvidia itself has tools that do not exclusively rely on CUDA. For example, Triton Inference Server is an open-source tool by Nvidia that simplifies deploying AI models at scale, supporting frameworks like TensorFlow, PyTorch, and ONNX. Triton also provides features like model versioning, multi-model serving, and concurrent model execution to optimize the utilization of GPU and CPU resources. > Nvidia's TensorRT is a high-performance deep learning inference optimizer and runtime library that accelerates deep learning inference on Nvidia GPUs. [...] Keller was speaking of OpenAI's Triton (https://openai.com/research/triton), a Python-like language that is compiled to code for Nvidia GPUs, but Tom's Hardware mixed this up with Nvidia's Triton Inference Server, a higher level tool that's really not a replacement for CUDA and not directly related to the Triton language. Easy to confuse these if you are a writer in a hurry. |