| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pthaine 1620 days ago
	Key takeways: (1) ONNXRuntime is the best inference package for Transformer networks; (2) Nvidia Triton, together with ONNXRuntime is the best solution for GPU inference; (3) Optimization matters. It’s quite easy to unlock a >10X performance gain in 2022.