Hacker News new | ask | show | jobs
by jackfoxy 2623 days ago
My company does machine learning checked by physical models. Our single biggest problem (and management is finally waking up to it) is curating the incoming data. And this is in a mature industry (oil & gas).
1 comments

My biggest surprise has been how little everyone is aware of their data quality.

The only explanation I've been able to come up with is that when it's all human processed, Joe 2nd-link-in-the-chain just deals with all the inconsistencies as best he can to get his job done, and never reports issues up.

Without those data inconsistencies to fix up every week, Joe would probably be out of a job.
But Joe typically hates dealing with the inconsistencies, and he can tell you exactly how they could be fixed.

It generally seems like (a) the suggestions for fixes are impractical to implement (overly detrimental effect on counterparty), (b) Joe isn't empowered organizationally to suggest fixes that will be implemented, or (c) Joe doesn't have access to the IT tools to implement fixes himself.

From my experience the answer is usually (d) all of the above.
A little bit of (a), but mostly (b)