Hacker News new | ask | show | jobs
by mlthoughts2018 2768 days ago
This is still perfectly synonymous with regular build tools, like running a rebuild in Jenkins or ‘build with parameters.’ The point is to treat builds and runs of an experiment setup exactly the same, with the same tooling, monitoring, data capturing, etc., as any other deployed program. There is nothing special about a one-off job that trains a model or computes an experimental result compared with jobs that perform an experiment on database tuning or test load on a web service or any other type of deployed job. You have monitoring and probing of key stats and health of the experiment, you can reproduce the exact run or the same run with modified parameters, and the run produces output artifacts or writes data. It’s all perfectly the same.

Basically if someone shows me a supposed ML experiment tracking system, the first question is, “If I replace the phrase ‘ML experiment’ with ‘generic computing task’, does the tool still handle everything exactly the same?”

If not, it’s a failed idea, because you’re trying to break model training or tuning jobs out of the regular deployment model and you’re not using consistent tooling to manage deployment of experiment runs and all other types of “jobs” that you can “run.”

1 comments

Sure you can reuse tools to achieve similar results. As with everything else the devil is in the details. Does your monitoring system saves results forever or it only let you report 90 days back? Can you compare two runs in a meaningful way? i.e not just logs but also interactively plotting exploring your results? Do you need to spend hours to instrument your code? Can you sort jenkins job by a parameter/metric? What about reporting new results to an existing experiment? There's many more examples. But in any case if you can reuse your CI/CD system for ML experiment management you should do that. Another question worth considering is that if this is a "failed idea" why would engineering led tech companies build these systems? Obviously they tried reusing their current tooling.

The tools we've been building for the past fifty years were designed for software engineering. Machine learning workflows are different in many ways and as such require new tools and approachs. That's at least our perspective.

Literally all the example cases you mention are also needed when comparing results for database tuning, load balancing, A/B testing, etc. etc. None of those asks would differentiate ML projects from any other type of general project. So unless you plan to shoe-horn non-ML projects into an upstart system purportedly for ML projects, you’re just wasting resources (usually egregiously) by using a different tool. Even just thinking ML problems are different somehow is usually already a sign that you’re investing in ML in a way that is very unlikely to map to project success.