| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mlthoughts2018 2001 days ago
	It’s very, very widely used jargon. I’d put “data provenance” on par with “overfitting” or “GPU model training” in terms of the high, ubiquitous place it occupies in mainstream machine learning.

1 comments

disgruntledphd2 2001 days ago

Sorry, I have to disagree here. Its a term of art in some of the literature, but it's definitely not that widespread, certainly not in consumer tech data science, where I work.

link

mlthoughts2018 2001 days ago

I’ve worked professionally in quant finance, image processing, defense research, and several mid-to-large ecommerce and payment processor companies.

In all of them, data provenance has been a first class consideration of machine learning and data platform teams, like a day-to-day concern and baked in to architecture review guidelines and production checklists and whatnot for every ML project.

In many of these companies we had teams of 20-40 ML scientists, all of whom knew about data provenance as a first class consideration in their work, had experience with it from their past jobs and academic programs, and considered it on equal footing with any aspect of data curation, model selection, model training and model serving.

link

disgruntledphd2 1999 days ago

I mean, I shouldn't be surprised, as given our previous interactions, I feel like you are the anti-me, in that our experiences of similar things is so wildly divergent.

Shrug, such is life I guess. That being said, I care deeply about this stuff (but didn't have a word), so perhaps it will be easier to convince people to pay attention to the data with said word.

link