Hacker News new | ask | show | jobs
by sfink 1900 days ago
Personally, I think the main problem with ML is simpler: it works well for interpolation, and is crap for extrapolation.

If the outputs you want are well within the bounds of your training data set, ML can do wonders. If they aren't, it'll tell you that in 20 years everyone will be having -0.2 children and all the other species on the planet will start having to birth human babies just so they can be thrown into the smoking pit of bad statistical analysis.

3 comments

I agree, but that's equivalent to my original claim.

Being bad at extrapolation is a consequence of assuming all training data can describe your phenomena distribution and being wrong.

Outside of simple time series, I'm not aware of any good way to extrapolate.
One way to extrapolate is to use a mechanistic or semi-mechanistic model. The recent advances in neural differential equations are a really cool example of this
> If they aren't, it'll tell you that in 20 years everyone will be having -0.2 children and all the other species on the planet will start having to birth human babies just so they can be thrown into the smoking pit of bad statistical analysis.

https://xkcd.com/605/