| In some fields its more important than others. In life sciences research to support synthetic control arms, the FDA is caring more about the lineage/manipulation of the data than the data science models used to predict X/Y/Z. IE - what was the data originally, what did it end up as prior to ingestion into AIML, why was it changed, what steps were involved, etc. There are not a ton of good out of the box solutions for data lineage and its driving me nuts. We have Apache NIFI which promises data lineage out of the box and _appears_ to deliver. I've never implemented it though. We have pachyderm which has some support here but I don't know about it. Besides that it appears roll-your-own. I kind of wish there was an accepted best practice for data lineage but its - surprisingly - wild west. And its completely 100% required for industry use. |