We could probably keep going round and round saying "no that's not how it works" and appeal to our technical expertise in the area in question, but instead lets argue it from first principals.
On the one hand we have an ultra fast HFT that is just sitting there trying to do only latency arbitrage between 2 venues. On the other we have an ultra fast HFT that is making markets on multiple venues. For the first HFT the upside to their strategy is that they can hold very little inventory. Of course they are not going to be able to buy orders that are already at the correct price. That is, orders that could have been put in place seconds, minutes, days or weeks earlier are going to have time priority regardless of how fast the pure latency arb player is. They are also going to be wrong some percentage of the time. Meaning they are going to have inventory they need to unload and all that entails.
Meanwhile the market making HFT has more inventory risk, but they also get some of the latency arb for free, as they are already at those positions. They get some of the latency arb the same way the pure arb player does (ie they are super fast as well) and when they are wrong or lose the race they also have sophisticated inventory management processes in place that they amortize across all of their strategies and not just the latency arb ones.
It turns out that the second model is more profitable, and that is very very important when you are investing in super low latent bespoke networks as part of your operational model.
Or I guess we could do it the other way. No you are wrong. That is not possible on any exchange that I know of.
Luckily there is an easy way for you to prove your contention. The market data feed and order management specifications for all of the exchanges in the US are available as public information. Find a single one that allows that order of operations.
Wait what? I thought there were two feeds. One for the paid subscribers in near real-time and a purposefully delayed feed for the rest of us.
In order to prove what you are asking for would need full access to the realtime feed AND corresponding time-resolved data from multiple brokers where a buy call gets intercepted.
Where is the lie in what the GP wrote? A quick Google search supports what he wrote, eg. the RBC story.
Depending on the exchange there can be many feeds that are price differentiated by feature set (including speed requirements). There are no non-paid feeds, if you are getting market data it is being paid for by someone.
In particular though, most exchanges follow a pattern where market data is broadcast (usually udp) and order management is bidirectional unicast (usually tcp/ip). On the order management side, no one sees your order until after the exchange does. The exchange then executes the order (filling it if there is match on the other side, or adding it to the order book). It then propagates down the market feed side the outcome of that order. Only then do other participants see it, regardless of how fast they are. There is no opportunity for the fast operator to get in front of the order as the GP describes.
You can verify this easily for any exchange you like (at least SEC regulated ones in the US) by reading the technical specifications of the electronic trading platform.
I don't know what google search or RBC story you are talking about, but in Flash Boys for instance, they make pains to imply that HFT are front running orders as the GP describes but they never say it outright. Because they know it can't happen. A good rebuttal to Flash Boys is Flash Boys, Not So Fast.
The different feeds are irrelevant in this case, because no exchange sends (to any feed) information about: a) incoming immediately-executable orders, or b) unexecuted portions of an order which will be routed to another destination. The first anyone will hear about an incoming order is when it executes, and the first they'll hear about the routed portion is when it executes somewhere else.
On the one hand we have an ultra fast HFT that is just sitting there trying to do only latency arbitrage between 2 venues. On the other we have an ultra fast HFT that is making markets on multiple venues. For the first HFT the upside to their strategy is that they can hold very little inventory. Of course they are not going to be able to buy orders that are already at the correct price. That is, orders that could have been put in place seconds, minutes, days or weeks earlier are going to have time priority regardless of how fast the pure latency arb player is. They are also going to be wrong some percentage of the time. Meaning they are going to have inventory they need to unload and all that entails.
Meanwhile the market making HFT has more inventory risk, but they also get some of the latency arb for free, as they are already at those positions. They get some of the latency arb the same way the pure arb player does (ie they are super fast as well) and when they are wrong or lose the race they also have sophisticated inventory management processes in place that they amortize across all of their strategies and not just the latency arb ones.
It turns out that the second model is more profitable, and that is very very important when you are investing in super low latent bespoke networks as part of your operational model.