| HN Mirror

One FAANGUAMLetc engineer told me they SSH, Slurm, and track experiments by telling their manager which parameters were best the day before. This was very strange given that this company has a machine learning platform, so either this engineer did not use it, or they did not use it that much.

We were talking about our machine learning platform and taking it for a spin. We do have long-running notebook scheduling[0] but we wanted to be able to watch the notebook's output from multiple devices as it was running, and for it to survive closed tabs or network disconnections, not just get the results once it's done. We also wanted to be able to do that right from the notebook's interface, instead of SSH'ing and all that, as this was tedious and some of our users aren't that comfortable doing that.

- [0]: https://iko.ai/docs/notebook/#long-running-notebooks