|
|
|
|
|
by superkitty
2413 days ago
|
|
We use self deployable configuration to allow Data Scientist to control the model's destiny. Models are written in Python (mix of pytorch/NLP/tensorF).
The Models are serving about 35 predictions/second on avg.
The API server written in the Python. API server container feeds or write the requests in the distributed queue cluster.
The models picks up the samples from the queue in batching.
It allows to experiment the models (different flavor) based on the routing being set during the deployment time and which in turns being set in the cache.
We use AWS managed cache, queuing and container orchestration platform.
Next:
1)Current pipeline for the training and production is two separate pipeline which we want to combined, possibly use MLFlow, Airflow or KubeFlow.
Deployment to the production is done through Jenkins.
2)Active retraining and auto deployment to production.
3)Tie the version of model in production to model being trained. There is no way for us to tie back the version. |
|