Ask HN: Next steps for scaling a scikit-learn Flask ML API

Y	Hacker News new \| ask \| show \| jobs

1 points by frist45 3021 days ago

We currently have an internal API that's core to our business. The models are loaded as .pkl files with scikit-learn joblib and served via Flask w/ Gunicorn using Gevent. We've tried Tornado as a worker class and Cherrypy as a replacement for Gunicorn -- none produce significant performance benefits.

We're hosting it in a Kubernetes cluster with really large nodes (140GB). Each container user ~5GB of RAM And considering the response time (~750ms), we can only add about 30 req/sec for each node we add ($1.5k). It appears the single request is CPU bound make it difficult to widely scale.

This is cost prohibitive and feels like we need to move towards other tools/approaches.

As the person who's managing the infrastructure, I'm less familiar with the current eco-system of larger-scale tooling. Ideally, the next iteration would keep the HTTP transport layer to allow for minimal changes to the rest of the system.

What would be a logical next step for us to scale the existing scikit-learn/Flask API?