| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rzezeski 5515 days ago

I know nothing about Switft so I'll keep my comments strictly related to Luwak.

Luwak preforms really well as a large object store...up to a point. I've personally seen latencies of < 4s when writing/reading a 700+MB CSV across 3 physical nodes (keep in mind, in the case of write, that doesn't mean all the data was strongly consistent yet...it's all async). Luwak has a really cool feature that also acts as a double edged sword -- it's a persistent data structure [1]. When you insert data it's chunked into blocks which are then keyed by their hash, i.e. a Merkle Tree [2]. If 2 (or more) blocks of data are the same only one instance will ever be created which acts as a general form of compression. The flip side of this is that you can't just willy nilly delete a file. You must perform garbage collection (via reference counting). Currently this is not implemented in the main line but I have a branch with a prototype implementation that uses Map-Reduce under the covers [3]. It scaled for me up to about 20+ GB of data and then I started hitting timeouts. I had plans to take this further but went a different direction for my purposes (which wouldn't relate to your problem at all so I'm not going to bother stating them).

[1]: http://en.wikipedia.org/wiki/Persistent_data_structure

[2]: http://en.wikipedia.org/wiki/Hash_tree

[3]: https://github.com/rzezeski/luwak/tree/delete2-1.0