Hacker News new | ask | show | jobs
by cygn 1069 days ago
You can just use Triton which is basically TFserve for Tensorflow, Pytorch, Onnx and more.
1 comments

Can you explain that?

My understand of Triton is more that this is an alternative to CUDA, but instead you write it directly in Python, and on a slightly higher-level, and it does a lot of optimizations automatically. So basically: Python -> Triton-IR -> LLVM-IR -> PTX.

https://openai.com/research/triton

It's confusing, there's OpenAI Triton (what you're thinking of) and Nvidia Triton server (a different thing).
Original comment is referring to Nvidia triton inference server