Hacker News new | ask | show | jobs
by m_ke 2769 days ago
I'm looking into switching over to using MLflow or Polyaxon for experiment management and tracking. We currently us a a custom built django app for experiment tracking and run experiments by hand on desktop workstations but we're starting to move some of that over to GCP.

For people who have used either of the projects, what are your opinions and are there any hidden issues that you ran into?

Ideally we'd like to have a platform that makes it easy to schedule runs on the desktops or GCP depending on requirements and available resources. Seems like kubernetes might be the best option for that and it doesn't look like MLflow supports it out of the box yet.

4 comments

Polyaxon is really great in term of functionality and UX. It's still pretty early stage, so there are some bugs, but overall I am very impressed by it. We have been using for a few month now with a couples of ML researchers.
My main issues are that if you're using the serving functionality, the containers it builds take a long time to start because the environment/dependencies are loaded at runtime instead of being baked into the image. Also, it doesn't have the ability to use a db or remote file store to save experiment info, so you need to use EBS volumes or something for persistence.
While MLflow doesn't submit jobs to Kubernetes for you, it should be possible to integrate it with your favorite scheduler to do that. MLflow is designed to accept experiment results from wherever you are running your code, so you can just submit an "mlflow run ..." command to Kubernetes and have it report results to your tracking server.
We use it for experiment tracking and model repository in our CI/CD flow. More details on our approach - https://stacktoheap.com/blog/2018/11/19/mlflow-model-reposit...