|
|
|
|
|
by mathgenius
3665 days ago
|
|
It's just so ridiculously easy to overfit these models, and so so many ways to shoot yourself in the foot as a result. For example, "I split the data set into 5 random segments and then trained a model on 4 of the 5 segments and then tested it on 5th." Such data is serially correlated (it's not good old iid) so already it looks like you have poisoned the test set with information from the training set. The hard part is not "feature engineering" or "ensemble methods", the hard part is controlling the entropy that you feed these things because they are voracious monsters and will absolutely eat all of it. |
|
Kind of. If it was that simple making money off of an autoregressive model would be trivial -> everyone would do it -> serial correlation would disappear.
I agree with your observation that figuring out what to feed the beast is one of the bigger challenges though. Case and point: train a mean reversion model on the last seven years of S&P data to buy dips and train a momentum model to buy higher highs. That equity curve would look very encouraging. Do it on a fifteen year basis, and not so much. Now the question becomes: how long of a lookback do you use when training your models? Chopping up data at random will mux out useful correlations. Subsetting into periods leads to poorly generalized models. Not fun.