Hacker News new | ask | show | jobs
by lacker 2171 days ago
The tough thing is that a common failure mode of many of the modern AI solutions is some output that looks superficially correct, but doesn't actually map correctly to the real world. When you want a table of data, it seems like the danger will be high that the table looks correct but isn't actually accurate. The problem here is about keeping sloppy data out of your table, which is tough for a statistical AI.

So yeah, I would expect this to be a long ways away.

2 comments

For more interesting problems than "which country has the highest GDP?", it's about more than just sloppy data. If you want to include any covariates, how do you know which ones to include? You could try to include everything predictive, but then you'll use the client margin column to predict client revenue or something. Or you'll control for a column causally downstream, biasing your estimates, like estimating revenue differences and controlling for page views in an experiment that affects page views. There's so much that we just don't include in our databases that's crucial to using them, and it's not just about sloppiness.
tbf that problem is also quite tough for data scientists, the model doesnt need to be flawless just better