|
|
|
|
|
by dlewis1788
461 days ago
|
|
Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end. |
|