So how should we evaluate the quality of a paper on trading AI? I mean the authors might not have access to real data, but their ideas might still be good.
There are some ML problems where it is fundamentally impossible to use historical data to make accurate forward looking predictions as its not IID. These fields require you very carefully capture data on sub-optimal choices. In the case of trading this means making explicitly bad trading decisions some portion of the time, and teams that have done this at any scale are unlikely to share the data.
In the case of trading, any paper not tackling these issues head on is not likely to be useful.
I don't get it - if you have accurate historical data, how is this different from having access to current real-time data? Why can't you pretend you live 20 years in the past and use the data you have as if it were real-time?
Data distribution shift. The market changes over time and your current data does not come from the same distribution as old data. That limits the amount of data you can use for training and testing. You need to be very careful not to overfit. That's especially true for something like daily or hourly data - there isn't much data to begin with and you won't have much left if you look at only a few weeks or months. Market data already has a low signal/noise ratio to begin with, so you need a good chunk of data to learn from.
As you go to shorter time scales you get more usable data, but then you also need to deal with other issues such as latencies/jitter, market impact, complex order types, order book queues, etc. It becomes a different game.
For 1 because your trading existence in that universe would change the future which you can't account for. Your activity influences decisions of other HFTs in real time whereas with a static history you're claiming to be able to trade without perturbing the markets.
Fundamentally, the issue is that in real time you may not be able to make the trade that your algorithm chose.
You could get close if you had the actual book prices at any given time, but even then, you might lose to someone who is 1 millisecond faster.
So no, backtesting can simulate reality.
Interactive Brokers offers a simulated account where you can practice "live" trading, although it's still not the same, since there's no money involved. But if I see a paper tested on an IB simulated account I'll be very interested, and it'll be too late already.
Good point, but I was mainly thinking about making a single good decision to multiply your investment, not HFT. Like identifying that it was a good idea to invest in Tesla stock 7 years ago.
In the case of trading, any paper not tackling these issues head on is not likely to be useful.