Hacker News new | ask | show | jobs
by faltet 4852 days ago
HDF5 is a very nice format indeed, and in fact, BLZ is borrowing a lot of good ideas from it. However, HDF5 has its own drawbacks, like not being able to compress variable length datasets, the lack of a query/computational kernel or its flaky resiliency during updates. Also, its approach for distributing data among nodes diverges from our goals.

Finally, you are right, speed is pretty important for us, and we think that our approach can make a better use of current computer architectures.

1 comments

I use HDF5s for storage/analytics of tick data; my experience has been that the performance for storing large sparse matrices is both expensive in storage space and speed.

I know that a lot of my peers in HFT have an illicit love for column stores, but for a lot of work, there's the need for converting to 'wide' format, which quickly can take a 10 million row matrix with a few columns to one now with a few thousand columns. (And thus stuff like KX, FastBit, etc becomes sort of suboptimal)

The need for massive 'last-of' information for time series leads to basically abandoning python/pandas/numpy and using C primitives and doing a lot more than you'd typically like 'online' but really a lot of this could happen behind the scenes with intelligent out of memory ops.

So...I'm pretty excited for innovation in data stores -- I look forward to seeing more!