Hacker News new | ask | show | jobs
by albertzeyer 1069 days ago
Can you explain that?

My understand of Triton is more that this is an alternative to CUDA, but instead you write it directly in Python, and on a slightly higher-level, and it does a lot of optimizations automatically. So basically: Python -> Triton-IR -> LLVM-IR -> PTX.

https://openai.com/research/triton

1 comments

It's confusing, there's OpenAI Triton (what you're thinking of) and Nvidia Triton server (a different thing).
Original comment is referring to Nvidia triton inference server