Hacker News new | ask | show | jobs
by dre85 2623 days ago
I always find it kind of silly when AI is just thrown into a field with the notion that they'll just deal with the messy, subjective and unstructured data (like hand written medical notes for example) as is. For me it makes much more sense to try to clean up and structure the data from the start instead. Maybe come up with some data acquisition compromise that is both UX friendly and give rise to more structure and consistency.
1 comments

My company does machine learning checked by physical models. Our single biggest problem (and management is finally waking up to it) is curating the incoming data. And this is in a mature industry (oil & gas).
My biggest surprise has been how little everyone is aware of their data quality.

The only explanation I've been able to come up with is that when it's all human processed, Joe 2nd-link-in-the-chain just deals with all the inconsistencies as best he can to get his job done, and never reports issues up.

Without those data inconsistencies to fix up every week, Joe would probably be out of a job.
But Joe typically hates dealing with the inconsistencies, and he can tell you exactly how they could be fixed.

It generally seems like (a) the suggestions for fixes are impractical to implement (overly detrimental effect on counterparty), (b) Joe isn't empowered organizationally to suggest fixes that will be implemented, or (c) Joe doesn't have access to the IT tools to implement fixes himself.

From my experience the answer is usually (d) all of the above.
A little bit of (a), but mostly (b)