|
|
|
|
|
by mateiz
2767 days ago
|
|
Don't MLflow Projects exactly meet this use case? A project lives in a Git repo, which can include both code and data, and specifies its software environment (currently Conda but will eventually also support Docker): https://www.mlflow.org/docs/latest/projects.html. You can then run it wherever you want to run code: CI system, Kubernetes, cloud, etc. The reason MLflow doesn't force people to use Projects is because many users like to develop ML in notebooks, but we definitely expect engineering teams to use it with Projects. |
|
The project was a suite of neural network models that provided face & object detection results in a low-latency web interface where customers can manipulate photos and want automated metadata about people or objects.
In our case, to optimize for performance we need to frequently experiment with compile-time details of the runtime environment (in our case a container) where the application will run in production.
So the axis of our experiments wasnot usually anything to do with neural network layers or data or parameters. It was different compiler optimization flags, different precision approximations and GPU settings that needed to be rolled into a huge number of different underlying runtime environments, and then for each distinct runtime environment the more mundane experiments would be carried out for layer topology, number of neurons, width of CNN filters, etc.
We found that unless youbasically build your own entire “meta” version of ML Flow that wraps around ML Flow, then it falls apart at use cases where custom compile time details of the runtime are themselves aspects of the experiment. Not to mention that the Projects formatting violates good practices, like 12 Factor stuff, for how to inject settings from the environment, which again leads to wasted effort making special case deployment handling for ML Flow jobs.
Whatever deploys and measures your tasks should not also impose any type of special case packaging structure, which is a big reason why MLFlow conceptually fails. Any attempt to make anything at all like a DSL packaging layer for experiments that causes it to diverge from “regular deployment of any old job” is immediately a failed idea. The only thing it’s good for is creating unwitting vendor lock-in once you’re highly dependent on this bespoke, weird packaging template for Projects that makes your ML jobs weirdly (and needlessly) different from other deployment tasks.