Hacker News new | ask | show | jobs
by Fiahil 1178 days ago
Yet, everyone misses reproducibility and data versioning :)

So, talking about monitoring, training, and recording model drift is only a single side of the domain.

2 comments

> Yet, everyone misses reproducibility and data versioning :)

Delta Lake/Apache Iceberg solves that.

Absolutely not.

A single vendor/tech does not "solve" anything when the task at hand implies you need to entirely re-design data pipelines, ML modelling and benchmarking.

LMAO no, no it doesn't and has major migration consequences for existing data warehouses.

Reproducibility is more than just upstream data versioning.

Not to mention dependency management! Since a lot of ML code is in Python this ends up being a very tricky thing to handle at scale (especially if you need to update dependencies, etc.)
I didn’t care much for Docker until I started working in the Python ecosystem.