| It’s shocking if you’ve worked professionally in statistics and not heard about data provenance. A few publications from ~2011-2015 period: http://ceur-ws.org/Vol-1558/paper37.pdf https://ieeexplore.ieee.org/document/5739644 https://link.springer.com/chapter/10.1007/978-3-642-53974-9_... Add a variety of additional links dating back a bit further (note the emphasis in this case on research data and tracking state of an experiment). https://nnlm.gov/data/thesaurus/data-provenance Data provenance is not a database / data warehouse term. It is uniquely and specifically a basic “101” concept of statistical science and ML / data science, where the custody and tracking of data are specifically tied to iterations of experiments, prototypes and research, for the sake of reproducibility. If I was interviewing an experienced statistical researcher and they didn’t at least have a working knowledge of the core concepts, that would be a huge red flag. |
Another poster mentioned vendor brochures and trade shows, which is in line with my expectations about which community it stems from, and also explains why I've never heard of it because I try to keep away from such environments these days.
Everywhere I've been the things which I take to make up "provenance" have generally been referred to under the simple label of "data quality", with separate subset definitions and measures such as timeliness, source, authority, format, history, suitability, verification, etc.
Of course, that's assuming people even worry about such things. In practice, let's be frank, anyone who's worked with data science knows they actually get shorter shrift than they deserve in practice: I'm probably among a minority of people in the real world who actually take things seriously, and I find myself on a constant crusade to remind people that just because a data point exists in a data set doesn't mean it's useful/ appropriate/ truthful/ unbiased.
data quality is a bit problematic, because I can see it being used by people who think provenance doesn't have any thing to do with quality, and from a variety of fields, but it is also infinitely more popular according to historical search trends, and in my last three jobs provenance would fall under the data quality framework.