Hacker News new | ask | show | jobs
by joaquincabezas 945 days ago
Thanks a lot for the material Varun, neat presentation with exhaustive computations that make it easy to follow. Question on the serving part: vLLM, Deepspeed, TensorRT-LLM... ? Thanks!
1 comments

Thanks!

vLLM for quick set up, TRT-LLM for best performance. Both available on https://baseten.co/.