| Things to make sure of when choosing your distributed storage: 1) are you _really_ sure you need it distributed, or can you shard it your self? (hint, distributed anything sucks at least one if not two innovation tokens, if you're using other innovation tokens as well. you're going to have a very bad time) 2) do you need to modify blobs, or can you get away with read/modify/replace? (s3 doesn't support partial writes, one bit change requires the whole file to be re-written) 3) whats your ratio of reads to writes (do you need local caches or local pools in gpfs parlance) 4) How much are you going to change the metadata (if theres posix somewhere, it'll be a lot) 5) Are you going to try and write to the same object at the same time in two different locations (how do you manage locking and concurrency?) 6) do you care about availability, consistency or speed? (pick one, maybe one and a half) 7) how are you going to recover from the distributed storage shitting it's self all at the same time 8) how are you going to control access? |
2) No modifications, just new files and the occasional deletion request.
3) Almost just 1 write and 1 read per file, this is a backing storage for the source files, and they are cached in front.
4) Never
5) Files are written only by one other server, and there will be no parallel writes.
6) I pick consistency and as the half, availability.
7) This happened something like 15 years ago with MogileFS and thus scared us away. (Hence the single-server ZFS setup).
8) Reads are public, writes restricted to one other service that may write.