| Longtime KDB user here
I think you maybe have some misunderstanding personally and some poor engineering at your firm around the the tech/data. Timeseries data particularly market data is exactly the use case the product excels at. The wire format is compressed. KDB horizontally scales (even their competitors comparison pages state this - https://www.influxdata.com/comparison/kdb-vs-tsdb/) A few things to consider that might help - you do not want a solution (in any language/tech) that involves pulling an entire day of market data off disk, across the wire and over to your process for analysis. KDB will not excel for this, nor will anything else. KDB shines when you learn to move your code to the data rather than your data to the code. What does "move the code to the data" mean in practice? You can do things like use PyKX which allows you to run your python & kdb code together on top of the data directly in the same process. You should do as much of the filter/aggregation/joins/etc over on the KDB side before pulling the results back. You should also define, generate and use pre-aggregated data where it makes sense for your use case (second / minute / day bars). Backtesting in KDB is relatively trivial as you have historical data organized by day and symbol. Any half decent KDB dev should be able to cook one up of increasing complexity for you. Nick Psaris has a couple books that cover more advanced topics that may be of use. |
Honest question - why? An entire day of market data for busy option series will be in low hundreds of gigabytes with proper wire format, maybe with some compression it'd be tens of gigabytes. Even with 10 Gbit/s networking (which is kinda slow - I believe you can get at least 40 Gbit/s for Amazon EC2<->EBS) the whole day of data will be transferred in a few minutes, which means your bottleneck will be compute, not IO/network. And compute can be parallelized pretty easily.