Hacker News new | ask | show | jobs
by kkielhofner 847 days ago
As someone who has utilized Nvidia Triton Inference Server for years it's really interesting to see people publicly disclosing use of TensorRT-LLM (almost certainly in conjunction with Triton).

Up until TensorRT-LLM Triton had been kind of an in-group secret amongst high scale inference providers. Now you can readily find announcements, press releases, etc of Triton (TensorRT-LLM) usage from the likes of Mistral, Phind, Cloudflare, Amazon, etc.

1 comments

Being accesible is huge.

I still see post of people running ollama on H100s or whatever, and that's just because its so easy to set up.