|
|
|
|
|
by philipkiely
321 days ago
|
|
TRT-LLM has its challenges from a DX perspective and yeah for Multi-modal we still use vLLM pretty often. But for the kind of traffic we are trying to serve -- high volume and latency sensitive -- it consistently wins head-to-head in our benchmarking and we have invested a ton of dev work in the tooling around it. |
|