|
|
|
|
|
by kkielhofner
847 days ago
|
|
As someone who has utilized Nvidia Triton Inference Server for years it's really interesting to see people publicly disclosing use of TensorRT-LLM (almost certainly in conjunction with Triton). Up until TensorRT-LLM Triton had been kind of an in-group secret amongst high scale inference providers. Now you can readily find announcements, press releases, etc of Triton (TensorRT-LLM) usage from the likes of Mistral, Phind, Cloudflare, Amazon, etc. |
|
I still see post of people running ollama on H100s or whatever, and that's just because its so easy to set up.