Hacker News new | ask | show | jobs
by jasoncrawford 1519 days ago
Yes, but if your data is actually exponential, the linear segments are not going to be better approximations than an exponential curve. That's what's going on here
7 comments

I'm not sure that's true in general, nor even frequently. In fact, I'd say it's provably false in general.

The big issue is that you get MANY more curve-fitting parameters to play with if you use a piece-wise linear model vs. an exponential model. (You get to choose HOW MANY breaks to make, what the slope is for each section, and WHERE to make the breaks.)

So... Let's say you created some synthetic data using an underlying exponential plus a normally distributed random number. Obviously, the BEST predictive model is an exponential one. However, for any arbitrary number of observations, I guarantee you there's trivially at least one piece-wise linear model that will have less error than the exponential one. Consider the one that is simply a straight line between EVERY point. Obviously that has zero error compared to the exponential model. Yet, it has very little predictive power compared to the exponential model.

Now, that's not what was done here... but there's actually quite a few parameters in the form of where to make the breaks and how many to make. Doesn't seem like a fair comparison.

Good point.

The paper does cross-validate the models, and I am told that cross-validation properly penalizes overfitting with too many parameters… but I don't understand the statistics well enough here.

For any sampled data you'll get guaranteed 100% fit by making it pieceways constant with N fragments where N=number of data points. It says nothing about the function you sampled, it's just a way to cheat by overfitting.
The dangerous bit is that an exponential curve will also be a fairly good fit for a logistic function that's not yet fully observed.
Every apparent exponential in the real universe must actually be a logistic or some other bounded curve.
Otherwise, we would have about a hundred trillion people infected with Covid by now.
Except for the size of the Hubble volume /s

Cosmic inflation should guarantee an exponentially growing observable universe.

Yes, and the one thing you expect on a logistic function is arguments about whether it's linear or exponential.

But with the amount of noise in economical data, I don't think is evidence of anything.

True, fair point.

Yeah like the article mentions, they are basically making an analogy to the idea of “punctuated equilibrium” from evolutionary biology. Here’s a good exploration of how punctuated equilibrium works, vs the alternative which is called gradualism.

https://gvpress.com/journals/IJBSBT/vol3_no4/3.pdf

NB: "Note that both of these charts are on a log scale."

Appears to apply to the two preceding linear-scale charts.

With noise and enough segments, the linear functions can certainly fit better.
If you zoom in far enough every curve looks linear.
> Henri Poincaré famously described them as "monsters" and called Weierstrass' work "an outrage against common sense", while Charles Hermite wrote that they were a "lamentable scourge".

I wonder if there is a really long compound German word for "an achievement whose greatness is best measured by the degree to which it disgusts experts in the field."

Heh. Reminds me of how gamers manage to find exploits to cheese speed runs while developers react in dismay.
Not a bad description of the history of analysis. Turns out function spaces are absolutely full of gross things that don't quite fit nicely into your theory.
Wow that’s super cool, didn’t know about this
Thank you :)