|
|
|
Ask HN: What are you using to serve ML models in low latency?
|
|
2 points
by avin_regmi
2671 days ago
|
|
https://panini.ai/ is the easiest and fastest way to serve ML/DL models at low latency and makes the model deployment to Kubernetes in a few minutes. It also handles load balancing, caching and batching of user inputs. What are you guys using to serve ML models in low latency? |
|
If your unique selling point is low latency, you should at least show some numbers / benchmarks on your homepage.
and finally, there's no way us or our clients would allow our models to be uploaded to an external provider, it would have to be on-prem