| HN Mirror

I use HDF5s for storage/analytics of tick data; my experience has been that the performance for storing large sparse matrices is both expensive in storage space and speed.

I know that a lot of my peers in HFT have an illicit love for column stores, but for a lot of work, there's the need for converting to 'wide' format, which quickly can take a 10 million row matrix with a few columns to one now with a few thousand columns. (And thus stuff like KX, FastBit, etc becomes sort of suboptimal)

The need for massive 'last-of' information for time series leads to basically abandoning python/pandas/numpy and using C primitives and doing a lot more than you'd typically like 'online' but really a lot of this could happen behind the scenes with intelligent out of memory ops.

So...I'm pretty excited for innovation in data stores -- I look forward to seeing more!