Hacker News new | ask | show | jobs
by seeknotfind 716 days ago
I guess it's time to tell the story online?

When I graduated college, I spent 3 months as a programmer with my econ friend trying to build exactly this. I started off creating a system to paper trade stocks retroactively. So you imagine you go back in time and pretend it's January 1st, 1982 then have an algorithm look at the stocks then, then move it a day forward, and let it trade for the past 40 years and see how it does.

We tried linear models, SVMs, neural networks, RNNs, ensembles, genetic algorithms, anything with stock data, news sentiment data, classic quant structures, and everything in-between. Basically, 3 solid months of coding before I started working.

Anyway, I found out a lot of stuff the hard way, because I didn't have an econ degree.

First off, you try enough methods, you end up p hacking or hill climbing the past anyway, and it's no good.

Second off, historical clean data is hard to get. It may or may not have splits in it or other things, so you may inadvertantly supply information from the future when playing back from the past. It's hard to get this right.

Third off, for many of the models we used, they were almost always competitive in the 80s (even a linear regression), but in the oughts or 2010's, they stopped being competitive. We thought computer based trading was becoming more competitive in hedge funds.

Fourth, simple models tended to work better. So for instance we may have trained the model on data from 70s-80s, then starting in the 80s, we did online (continuous) training as we moved the model forward in time. There's just not enough data. You can train on all historical stocks or all stocks or related data streams in the industry up to that point, but I think we probably didn't have enough data and the market is competitive.

Fifth, I wish I read a Random Walk Down Wall Street earlier, or all of Taleb's stuff. These are books that have deep mistrust of quants.

Sixth, I think to be competitive, you need to have money in the game, many heuristics, and industry experience. Big firms have this and equipment, but it's hard to get in as an individual.

Seventh, I put several hundred hours into this project and learned a bunch about machine learning and economics. In every way I loved the experience, and I'd encourage you to try it. Probably I'm a n00b here, but I hope some of my notes can help you.

4 comments

I have this idea that we only have one universe of historical financial data, and it is only 500 years long, which is ridiculously small. So backtesting and drawing conclusions is highly overrated.

Another thing as you said is that it's hard to get quality data. For example most databases don't include price history for bankrupt companies (or miss quite a bunch), which makes some quantitative strategies like focusing on low PE and PB for example, completely bogus. Which is sad because most books will actually tell you to do that, without ever talking about how many of those backtests lack companies with -100% return in their virtual portfolios. Those tend to be low PE companies that the market consider risky, and it was right, but because they disappeared, the strategies outperform because they ignore so many losers.

To develop a trading EDGE. You're looking for a market inefficiency. The model is not dependent on perfect data or even accurate data. The model is even tested with having random prices... by using a FILTER to see if it still holds.

Then, you're going to paper trade it. Then live trade it. Historical data can only give you a directional indicator... is your 'thesis' of market inefficiency... directionally accurate.

I don't think we have 500 years of data. Anything before 1926 is pretty sketchy.
Indeed, I pulled 500 out of my hat because that's roughly stock market history. But that kind of reinforces my point, if we only have 100 years that's very little, especially considering how fast things change. Early 20th centuries very few industries were publicly traded, even amongst those that existed at the time. And even beyond the industries, other things have changed a lot like regulation, taxes, accounting rules, management style... surely the market takes all of those into account, one way or another.
Yeah, the thing that makes trading and investing different from most other disciplines is that the distributions are completely non-stationary and are changing all the time. There are some "stylized facts" (that's the term to search for) so use those to at least ground your model but you won't make any money from that.
Yeah if you want good data, start collecting it now. I believe anyway the magic is outside the numbers. They are only a shadow on a cave's wall.
I agree, there is a qualitative statement to be made. AI can help especially summarizing text like news and earnings calls, but there is quite a bit of human work to be made beyond just running some software.
Kudos for learning a lot in a short time. Many take longer to recapitulate these ideas, I certainly did. Alas, this is the standard buy-side quant curriculum. IDK where SOTA is these days but you have to a different edge because as you highlighted, it's so easy to reproduce the basics.
As a fun aside, I find it quite fascinating when you see a particular signal go dead overnight.

I recall one particular overnight reversion strategy in my local market which in the 90s and early 2000s had like a 45 degree straight line equity curve and then overnight in 2011 it just went flat. Someone clearly turned on a model that day and started trading it. It's a small market here so I know who it was but I still find it fascinating how clearly it showed up in the data.

I had a simple trading structure that I summarized from a few sources. Curious what you might think of it.

1. Filter all underdog stocks.

2. Have a catalyst detector - e.g. OpenAI announces new model. Link that to NVIDIA, MSFT, etc.

3. Among these stocks, when you see a marubozu break through the previous resistance point, buy the stock. Then sell it near peak. (Need a peak detection algorithm)

The trick is having all three work fine. But it's easier to debug and test when one doesn't work if you break it into parts. You also don't need one really good methodology, you just need a few decent ones.

Or you could find similar patterns that can be broken down into small parts and do those too.

> Need a peak detection algorithm

I think thats all you need. ;)

In realiity, trading is hard because it requires 2 points where you have to get the timing right.

First you have buy the position at a time when it is favorable. Then you have to exit the position when you get profit. Sounds easy, but the hard part is that this profit also has to cover any past losses. where you failed timing the entry into the position.

Its easy when the stock goes up 800% and then falls 50% to 400% your original position, that you should have sold when it was at its "peak". But along the way to 800% you had so many times to sell for 500%, 600%, 700%, etc. and along the way the stock had fluctuations with many peaks.

If you sell to early you can't get enough profit to cover past and future losses. If you sell to late same story. So you have to nail the exit position also and that is where most models that rely on past data fail. People just walk through the parameters until the entry and exit positions on their test data line up to make a profit, but then can't replicate when going "live".

Another way to look at investing is everyday you are in the market, it is almost the same logically as selling and choosing to reinvest every day. So if you hold AAPL for 5 years, that is about 1000 days where the algo is choosing to invest (i.e. keep invested) in AAPL. Its pretty tough to have that many decisions points because even 0.1% noise would cause you to sell.

What we did was compare it to the previous peak, and then 1.1x that or so. That works well for trading breakthrough patterns.

If you're going long term, then it's quite different because there may never really be a peak. There might be corrections or something every now and then, but they whole idea behind long term is that these don't matter.

I would actually tie it to the other two - you'd have to detect when a stock is overpriced, and you'd have to detect catalysts.

Still tough. Meta sunk on their rebranding not so much because of the metaverse but because FB had been dead for a while. But the announcement was the catalyst. If Meta had been underpriced, it would have been a positive catalyst and people would have applauded the metaverse. Which would probably have triggered catalysts on RBLX, MSFT, and other metaverse players.

NVDA had been underpriced for a while, and the multiple catalysts have made it shoot up, though it's possible another one in the future could make it crash.

So IMO it's not just one point, the other detectors also help to filter the noise.

Did you ever put anything “into production” and actually start using it? And if so, are you still using it?
No. I trade manually based on fundamentals, diversification, and limited downside options. I've lost interest and respect for quantitative strategies while I've worked for and run my own businesses. So consider me a failure here and this is only a story of how not to build a lightbulb.