Hacker News new | ask | show | jobs
by dlewis1788 461 days ago
Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.
1 comments

triton is not that bad, TensorRT will give you nightmares
100% - probably why vLLM is now the default back-end in Dynamo.