I don't get it - if you have accurate historical data, how is this different from having access to current real-time data? Why can't you pretend you live 20 years in the past and use the data you have as if it were real-time?
Data distribution shift. The market changes over time and your current data does not come from the same distribution as old data. That limits the amount of data you can use for training and testing. You need to be very careful not to overfit. That's especially true for something like daily or hourly data - there isn't much data to begin with and you won't have much left if you look at only a few weeks or months. Market data already has a low signal/noise ratio to begin with, so you need a good chunk of data to learn from.
As you go to shorter time scales you get more usable data, but then you also need to deal with other issues such as latencies/jitter, market impact, complex order types, order book queues, etc. It becomes a different game.
For 1 because your trading existence in that universe would change the future which you can't account for. Your activity influences decisions of other HFTs in real time whereas with a static history you're claiming to be able to trade without perturbing the markets.
Fundamentally, the issue is that in real time you may not be able to make the trade that your algorithm chose.
You could get close if you had the actual book prices at any given time, but even then, you might lose to someone who is 1 millisecond faster.
So no, backtesting can simulate reality.
Interactive Brokers offers a simulated account where you can practice "live" trading, although it's still not the same, since there's no money involved. But if I see a paper tested on an IB simulated account I'll be very interested, and it'll be too late already.
Good point, but I was mainly thinking about making a single good decision to multiply your investment, not HFT. Like identifying that it was a good idea to invest in Tesla stock 7 years ago.
As you go to shorter time scales you get more usable data, but then you also need to deal with other issues such as latencies/jitter, market impact, complex order types, order book queues, etc. It becomes a different game.