Hacker News new | ask | show | jobs
by LSTMeow 2116 days ago
Disclosure/plug: Evangelist for AllregoAi here, but I'm going to only allude to our FREE open-source platform+devops solution - Allegro Trains- https://github.com/allegroai/trains

*

1100% Agree with you about unnecessary time spent on configuration and maintenance.

As a research-oriented professional, you need something that will seamlessly integrate with _your own_ flow.

We are in the ML stone-age, the playbook is not really written yet. Currently, CI/CD + agile is (necessary?) overhead that costs us precious time-to-product.

Here is my manifesto:

1. Anything related to "production" should be taken care of by DevOps peeps, yes even if it is "MLOps". Monitoring, standardization etc should not be your responsibility. If it is somehow on you, then it should be part of the same experimentation platform you are using. Extra tools? Extra people.

2. Likewise, anything related to data-engineering, preparation etc. should be compartmentalized and have separate version control (it is not as complicated as doing it the DVC way, BTW). If you do have to do these tasks - you guessed it - it should be part of the same experimentation platform you are using.

3. Research MLOps (ResOps?): Did I say experimentation platform? Any team member should be able to work as she wishes - Notebooks, scripts, whatnot. And if you forget to commit something before you run? You want to know about all the changes. Sharing? Comparing? - must have. Reproducible experimentation? Need to be able to automatically track environment variables, packages installed etc. Most importantly - Need to be able to offload to the cluster in the same running environment with a button click. I am not going to spend hours deciphering logs to find out that the wrong version of package was installed in our container. I am not going to spend days sorting out containers to find "the one that works"

4. Lastly - IT work ("devops") on cluster management: Monitoring your GPU usage per task, scheduling experiments, early stopping with a button click, on-prem managed platform - WHY IS THIS OUR JOB? - well, it isn't. But if it is, it should be integrated with your platform, and day-to-day operations should be "automagical", cluster config should be done once, by professionals (even outsourced help).

If you feel me here, then know that you are REALLY not alone. We took to heart what our clients & friends told us, and we launched Allegro Trains as a solution for all of this. Magically simple, and FREE.

Sorry for caps, I tend to be emotional on this ;) Hit me up on twitter @LSTMeow

1 comments

This is super cool!