|
|
|
Ask HN: Next steps for scaling a scikit-learn Flask ML API
|
|
1 points
by frist45
3021 days ago
|
|
We currently have an internal API that's core to our business. The models are loaded as .pkl files with scikit-learn joblib and served via Flask w/ Gunicorn using Gevent. We've tried Tornado as a worker class and Cherrypy as a replacement for Gunicorn -- none produce significant performance benefits. We're hosting it in a Kubernetes cluster with really large nodes (140GB). Each container user ~5GB of RAM And considering the response time (~750ms), we can only add about 30 req/sec for each node we add ($1.5k). It appears the single request is CPU bound make it difficult to widely scale. This is cost prohibitive and feels like we need to move towards other tools/approaches. As the person who's managing the infrastructure, I'm less familiar with the current eco-system of larger-scale tooling. Ideally, the next iteration would keep the HTTP transport layer to allow for minimal changes to the rest of the system. What would be a logical next step for us to scale the existing scikit-learn/Flask API? |
|