Ask HN: What are you using to serve ML models in low latency?

Y	Hacker News new \| ask \| show \| jobs

	Ask HN: What are you using to serve ML models in low latency?
	2 points by avin_regmi 2671 days ago
	https://panini.ai/ is the easiest and fastest way to serve ML/DL models at low latency and makes the model deployment to Kubernetes in a few minutes. It also handles load balancing, caching and batching of user inputs. What are you guys using to serve ML models in low latency?

1 comments

malux85 2670 days ago

Low latency for us means we can't spend 100+ ms on a round trip to an external server / hosted solution.

If your unique selling point is low latency, you should at least show some numbers / benchmarks on your homepage.

and finally, there's no way us or our clients would allow our models to be uploaded to an external provider, it would have to be on-prem

link

avin_regmi 2662 days ago

How big is low latency issue for you? What happens if it's more than 100ms? Also we do offer our software to be deployed in your kubernetes couster via helm.

link