Hacker News new | ask | show | jobs
by nicoinstrument 37 days ago
Thanks for the suggestion! Have you found that Ray Serve’s built-in autoscaling plays nicely with custom SLO-based concurrency limits, or do you usually let Ray handle the load balancing entirely?"
1 comments

To be honest, I don't know because I have not hit many of those limits due to what I would call "moderate" scale. So far, I have just provisioned enough pods to handle the traffic as-is without using KubeRay. So k8s is handling the load balancing adequately at the moment, but Ray serve is not cluster-aware, only pod aware, for now.