Hacker News new | ask | show | jobs
by threeseed 4225 days ago
I had to solve this problem at my last job and we had vendors let us know how companies like Apple, Sony, Disney, EA etc have solved it. Basically there are two ways to do it. (1) Store it on some "filesystem". In quotes because how you do it can vary wildly e.g. S3, GlusterFS, Standard directories using DRBD for HA. (2) Take the blob, slice it into pieces, hash it and spread it across a sharded database.

Generally it seems to be that if you have lots of unique large files then use filesystem. But if you have files which are likely to have duplicates then use a database. So a file storage locker may use (1) but a service like iTunes Match would use (2). And IIRC Apple in fact does store at least uploaded music files in Cassandra.

1 comments

We found that a GlusterFS filesystem is a good way to go for our case of 100s of terabytes of unique files with some failover. Amazon S3 would simply cost far more. The file metadata is in the RDBMS, obviously. But storing the binary data in the RDBMS wouldn't be a good idea, and would choke long before it got to our scale.