Hacker News new | ask | show | jobs
by robotwealth 3656 days ago
Hello

I'm Kris, the guy who wrote the article that started this thread. Thanks to all who have read my article and taken the time to comment. In the context of my motivation for starting my blog, it means a lot. I'm an engineer who became interested in quantitative finance and machine learning a few years ago. I learned how to code and apply my maths and stats knowledge to finance independently - no formal training whatsoever. This meant that for a long time I was conducting research and developing trading systems in a vacuum; I had no one to bounce ideas off or learn from. So I started writing about what I was doing in the hopes of getting some feedback. So thank you all for providing some. The insights were immensely valuable and I learned a lot.

I thought it would be useful to respond to some of the comments.

mathgenius brought up the extremely valid point that regular k-fold cross validation in a time series context doesn't make sense since the data is autocorrelated, not iid. I no longer use this approach for time series data, instead favoring Rob Hyndman's time series cross validation approach, also known as forward chaining. I believe this approach is the best representation of a real trading environment. The issue becomes deciding how large the rolling window of training data should be - older data may be obsolete, but excluding too much history can lead to not enough training instances.

dpweb raises a good point too, namely that just because your model performed well on past data, even if that data was out of sample, there is no guarantee that the future will be sufficiently like the past, meaning that your model may well become useless at some point in time (possibly very quickly). This is a valid point, but no reason to abandon the markets. It does however require that any algorithm's live performance be objectively monitored such that the level of deviation from expected performance can be statistically quantified. Once a pre-determined confidence level in the model's obsolescence is reached, it should be removed from the portfolio.

mcbrown's comment about publication bias is a good one too. Even worse, I've personally developed hundreds of trading systems that I haven't published. Other bloggers and publishers have most likely also done the same. This form of selection bias is very likely rampant, and is especially applicable to models 'discovered' using machine learning techniques that may not be rooted in traditional economic or financial principles. The moral: absent some form of robust accounting for selection bias, view all of these types of systems with a healthy dose of skepticism, and the published performance as a theoretical upper limit to what could be achieved in practice.

hendzen's point about partnering with a fund or proprietary trading company rather than running your reliable, alpha generating strategy yourself is also a valid one. I have happily found this out for myself recently.

Also, lordnacho is spot on regarding his take on the utility of data mining in finance.

Thanks again for all the comments!