|
|
|
|
|
by tcc619
5755 days ago
|
|
"file systems are much better at handling files" What about batch processing a large number of small files? say 10 million image files of 500KB. A typical file system will need to seek each small file. I wonder if GridFS stores small files in blocks to allow efficient batch retrieval for processing. |
|
The author of the blog post touts it as a _feature_ of MongoDB, but it's more accurate to say that it's an artifact of MongoDB's 4MB document size limit -- you simply cannot store large files in MongoDB without breaking them up. Sure, by splitting files into chunks you can parallelize loading them, but that's about the only advantage.
Among the key-value NoSQL databases, Cassandra and Riak are much better at storing large chunks of data -- neither has a specific limit on the size of objects. I have used both successfully to store assets such as JPEGs, and they are both extremely fast both on reads and on writes.
Neither is built for that purpose, and will load an entire object into memory instead of streaming it, so if you have lots of concurrent queries you will simply run out of memory at some point -- 10 clients each loading a 10MB image at the same time will have the database peak at 100MB at that moment.
Actually, Riak uses dangerously large amounts of memory when just saving a number of large files. I don't know if that's because of Erlang's garbage collector lagging behind, or what; I would be worried about swapping or running out of memory when running it in a production system.