| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by blah9874 3301 days ago

In my opinion, the real problem in that case was not the overfitting, but that they extrapolated from that data. They didn't have anything above Magnitude 8. (https://ml.berkeley.edu/blog/assets/tutorials/4/earthquake-f...)

You should never, ever extrapolate. It doesn't matter what your model is, it won't work.

On a side note, it could be that there is a breakpoint at Magnitude 7.25, where the slope of the line really changes, and a segmented linear regression is appropriate (https://en.wikipedia.org/wiki/Segmented_regression). But we would need more data to be sure, anyway.

2 comments

nshepperd 3301 days ago

Not extrapolating isn't really an option in cases like this. You have to give some prediction for earthquakes of magnitude 9. Ultimately you must make a decision on whether to design for such an event.

But a sensible thing to do would be to draw many samples from the posterior distribution, instead of just using the maximum likelyhood estimate. That way the prediction accurately represents the uncertainty resulting from not having any data above magnitude 8 as well as, perhaps, your background knowledge that earthquakes of magnitude 15 never happen.

sjg007 3301 days ago

In retrospect they should have calculated both intercepts and taken the more pessimistic one. It's surprising they did not. However this could've been a decision based on the cost. Still weird that wasn't explicitly called out. Maybe it was.