| HN Mirror

Yeah, last time I had to do this was about a year ago and I used parquet and arrow on S3-compatible object stores and put a bunch of metadata in postgres and the whole thing. At that time we used Prefect for orchestration which was fine but IMHO not worth what it cost, I've also used flyte seriously and dabbled with other things, nothing that I can get really excited about recommending, it's all sort of fine but kinda meh. I used to work for a megacorp with extremely serious tooling around this and everything I've tried in open source makes me miss that.

On the front end I've always had reasonable outcomes with `wandb` for tracking runs once you kind get it all set up nicely, but it's a long tail of configuration and writing a bunch of glue code.

In this situation I'm dealing with a pretty medium amount of data and very modest model training needs (closer to `sklearn` than some mega-CUDA thing) and it feels like I should be able to give someone the company card and just get one of those things with 7 programming languages at the top of the monospace text box for "here's how to log a row", we do Smart Things and now you have this awesome web dashboard and you can give your quants this `curl foo | sh` snippet and their VSCode Jupyter will be awesome.