Hacker News new | ask | show | jobs
by firemanx 5120 days ago
How big are the flatfiles that you're storing in HDFS? I've looked at it before for such a use, but for durability I want to write events in an isolate manner, which means lots and lots of small writes, either to single files or as a series of small files. I was under the impression that HDFS doesn't perform well in a use case like this (due to the size of it's write block size), but would LOVE if I could use it like that!
2 comments

We're using HDFS with a periodic merger process that occasionally merges small files into larger files. Given the block size, HDFS really does want larger files, but it can tolerate a decent number of small files. The bigger problem with this approach is providing a consistent view of the dataset so that already running programs don't have the world totally change out from under them.
You're right, there can be something of a "small files" issue with HDFS. This is a good article for strategies to get round it: http://www.cloudera.com/blog/2009/02/the-small-files-problem...