| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by DrJokepu 4273 days ago
	That being said, if your relational database has 1GB records, you really need to reevaluate whether you are using the right tool for the job.

2 comments

UnoriginalGuy 4273 days ago

While I kind of agree, what exactly is the "right tool" for storing large blobs?

You can use the filesystem but then you have to manage data migration/sync by "hand" which is kind of a bitch for distributed systems.

Honestly storing large blobs with meta-data telling you about the blobs is extremely common. I'm yet to hear of how you're meant to do it correctly.

link

threeseed 4273 days ago

I had to solve this problem at my last job and we had vendors let us know how companies like Apple, Sony, Disney, EA etc have solved it. Basically there are two ways to do it. (1) Store it on some "filesystem". In quotes because how you do it can vary wildly e.g. S3, GlusterFS, Standard directories using DRBD for HA. (2) Take the blob, slice it into pieces, hash it and spread it across a sharded database.

Generally it seems to be that if you have lots of unique large files then use filesystem. But if you have files which are likely to have duplicates then use a database. So a file storage locker may use (1) but a service like iTunes Match would use (2). And IIRC Apple in fact does store at least uploaded music files in Cassandra.

link

SideburnsOfDoom 4273 days ago

We found that a GlusterFS filesystem is a good way to go for our case of 100s of terabytes of unique files with some failover. Amazon S3 would simply cost far more. The file metadata is in the RDBMS, obviously. But storing the binary data in the RDBMS wouldn't be a good idea, and would choke long before it got to our scale.

link

jongalloway2 4272 days ago

Usually storing in filesystem with a reference in database seems to work best. Much easier for database management and backup, and file management as well.

The OP doesn't appear to have hear of SQL Server FileStream or FileTables, which handle this abstraction for you: http://msdn.microsoft.com/en-us/library/ff929144.aspx

link

sergiosgc 4273 days ago

On pgsql, the blob type. It stores the object out of the row store, but still in the database.

link

x0x0 4273 days ago

commonly, s3 or zfs nas for storage and pg for metadata

link

threeseed 4273 days ago

Commonly ? No not even close. The most common solution is that people just dump it as a BLOB in whatever database they are currently using. Which based on numbers is likely to be MySQL.

link

DrJokepu 4273 days ago

That's basically what a client of mine does.

link

henk53 4273 days ago

link