|
|
|
|
|
by pthaine
1573 days ago
|
|
Key takeways: (1) ONNXRuntime is the best inference package for Transformer networks; (2) Nvidia Triton, together with ONNXRuntime is the best solution for GPU inference; (3) Optimization matters. It’s quite easy to unlock a >10X performance gain in 2022. |
|