|
|
|
Ask HN: What's Your CI/CD Workflow for Your Machine Learning Projects?
|
|
160 points
by swyea
3024 days ago
|
|
We are working on an ML project (Computer Vision specifically) and we are in the process of productionizing our models. What kind of tools do you use and what's your workflow for integrating, testing and deploying your models? Do you have any suggestions or tips about what to do and what to avoid? |
|
First, I would highly recommend wrapping your ML models in some kind of microservice. Depending on your production requirements and if the ML is in Python a fairly simple Flask/Sanic web server should be sufficient. This is great because you can leave all your feature transformation code as is in Python.
If your production environment has very low latency requirements you are going to have some work cut out for you. You'll most likely have to rewrite all your transformation code in a faster language like Go or Java. You might also need to implement the inference code as well to get the speed you need. This adds considerable time and adds a ton of surface error for potential insidious bugs. The ML will still make predictions, but they will be wrong or very slightly wrong.
Because I'm working with larges of amounts of data and my source of truth is Parquet logs in S3, the pipelines start with Spark. We do as much data wrangling as possible in Spark to get things into a manageable size to create our train/dev/test sets. This data gets uploaded to S3.
The datasets are then trained on EC2 instances using Pandas & sklearn. When everything is fully automated the Spark job will push a message onto an SQS queue with the S3 path of the fresh dataset. An EC2 instance will be polling that queue and pull down the data and train a new model.
The final result of training my case is a text or binary model file that goes back up to S3. Our prediction microservice polls an S3 bucket and pulls down any updated model files and swaps out the running models.
Tips: