Hacker News new | ask | show | jobs
by pthaine 1573 days ago
Key takeways: (1) ONNXRuntime is the best inference package for Transformer networks; (2) Nvidia Triton, together with ONNXRuntime is the best solution for GPU inference; (3) Optimization matters. It’s quite easy to unlock a >10X performance gain in 2022.