| > A year ago, I naively wrapped the API and certainly felt this pain. Most people, before being confronted to it, have no idea how big market data feeds really are: I certainly had no idea what I was getting into. There's a reason all these subscriptions are that pricey. Here's an example of the pricing for the OPRA feed for Databento you mentioned: https://databento.com/pricing#opra We're talking about feeds that sustain 25+ Gb/s and can have spikes at twice or even three times that. And that's only for options market data. I mean: even people with 25 GB/s fiber (which we can all agree ain't the most common and that's an understatement) at home still can't dream of getting the entire feed. Having a bandwith big enough, storing, analyzing such amount of data: everything becomes problematic as such scales. As to me I'm reusing my brokers' feeds (as I already pay for them): it's not a panacea but I get the info I need (plus balances/orders/etc. tied to my accounts). |
I’m just noting for interest that firms are applying transformers and other networks at this streaming microstructure level, but specially trained for feature extraction. HRT + Nvidia have some nice videos about it
I will also note that it is insane how much better all the LLMs are at calling MCP tools after just a year, especially the local ones.
One of the reasons I like DuckDB is the scale flexibility. I started with grabbing data and playing on my laptop, then I jumped to a server with high cores and a NAS.