As someone who has utilized Nvidia Triton Inference Server for years it's really interesting to see people publicly disclosing use of TensorRT-LLM (almost certainly in conjunction with Triton).
Up until TensorRT-LLM Triton had been kind of an in-group secret amongst high scale inference providers. Now you can readily find announcements, press releases, etc of Triton (TensorRT-LLM) usage from the likes of Mistral, Phind, Cloudflare, Amazon, etc.
Up until TensorRT-LLM Triton had been kind of an in-group secret amongst high scale inference providers. Now you can readily find announcements, press releases, etc of Triton (TensorRT-LLM) usage from the likes of Mistral, Phind, Cloudflare, Amazon, etc.