| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Fiahil 1178 days ago
	Yet, everyone misses reproducibility and data versioning :) So, talking about monitoring, training, and recording model drift is only a single side of the domain.

2 comments

KptMarchewa 1178 days ago

> Yet, everyone misses reproducibility and data versioning :)

Delta Lake/Apache Iceberg solves that.

link

Fiahil 1178 days ago

Absolutely not.

A single vendor/tech does not "solve" anything when the task at hand implies you need to entirely re-design data pipelines, ML modelling and benchmarking.

link

alfalfasprout 1178 days ago

LMAO no, no it doesn't and has major migration consequences for existing data warehouses.

Reproducibility is more than just upstream data versioning.

link

alfalfasprout 1178 days ago

Not to mention dependency management! Since a lot of ML code is in Python this ends up being a very tricky thing to handle at scale (especially if you need to update dependencies, etc.)

link

pharmakom 1177 days ago

I didn’t care much for Docker until I started working in the Python ecosystem.

link