Hacker News new | ask | show | jobs
by manca 1562 days ago
Are you just interested in the training part and managing the trained models, or you'd actually like to productionize the models and serve them at scale?

A lot of end-to-end platforms are available nowadays that try to cover the entire lifecycle of a model from data prep, ETL, to training, serving, monitoring, operating. However, I found none of them really robust enough to cover all these cases perfectly, so I resorted to using different pieces from different vendors combined with my own stuff to make the entire platform suit my needs. This is still not perfect, though, and I think there's a lot of room for improvement in the space to enable really easy to use and scalable MLOps.

Still some of the tools I found to be ok: TensorFlow TFX, Kubeflow (to some extent - ops are a nightmare), Feast, MLFlow, GCP Vertex and AWS Sagemaker can get some work done, too.

2 comments

I'd say for the foreseeable future I simply want to focus training and running trained models, I don't plan to do anything at scale like launch a business, so the creating and training aspect is the one I want and probably should only focus on at first either way.

But I like your approach of stitching together various vendors so they fit your use case, I think it can be really flexible but also probably more expensive and slightly harder to manage... I think it can be worth the tradeoff though.

Thank you for the input!

You make a good point there. Personally I’ve struggled quite a bit moving from one off models to taking them to production. Would you mind elaborating on what you mean by none of the platforms being robust enough?