| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by earlhathaway 2683 days ago

my understanding: it's a combination of

1 - git based management (not storage) of data files used in ML experiments;

2 - lightweight pipelines integrated with git to allow reproducibility of outputs and intermediaries

3 - integrating git with experimentation

If you've worked on teams building ML products, this is something you've at least half-built internally. So you can share outputs internally with tracked lineage showing how to repro. Plus the pipeline management.