| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dlewis1788 461 days ago
	Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.

1 comments

triton is not that bad, TensorRT will give you nightmares

100% - probably why vLLM is now the default back-end in Dynamo.