Hacker News new | ask | show | jobs
Ask HN: What are you using to serve ML models in low latency?
2 points by avin_regmi 2671 days ago
https://panini.ai/ is the easiest and fastest way to serve ML/DL models at low latency and makes the model deployment to Kubernetes in a few minutes. It also handles load balancing, caching and batching of user inputs. What are you guys using to serve ML models in low latency?
1 comments

Low latency for us means we can't spend 100+ ms on a round trip to an external server / hosted solution.

If your unique selling point is low latency, you should at least show some numbers / benchmarks on your homepage.

and finally, there's no way us or our clients would allow our models to be uploaded to an external provider, it would have to be on-prem

How big is low latency issue for you? What happens if it's more than 100ms? Also we do offer our software to be deployed in your kubernetes couster via helm.