| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by KaiserPro 866 days ago

Things to make sure of when choosing your distributed storage:

1) are you _really_ sure you need it distributed, or can you shard it your self? (hint, distributed anything sucks at least one if not two innovation tokens, if you're using other innovation tokens as well. you're going to have a very bad time)

2) do you need to modify blobs, or can you get away with read/modify/replace? (s3 doesn't support partial writes, one bit change requires the whole file to be re-written)

3) whats your ratio of reads to writes (do you need local caches or local pools in gpfs parlance)

4) How much are you going to change the metadata (if theres posix somewhere, it'll be a lot)

5) Are you going to try and write to the same object at the same time in two different locations (how do you manage locking and concurrency?)

6) do you care about availability, consistency or speed? (pick one, maybe one and a half)

7) how are you going to recover from the distributed storage shitting it's self all at the same time

8) how are you going to control access?

2 comments

flemhans 866 days ago

1) only if it removes a "janitor" token of nannying the servers. Right now I just have one big server with a big 160TB ZFS pool, but it's running out.

2) No modifications, just new files and the occasional deletion request.

3) Almost just 1 write and 1 read per file, this is a backing storage for the source files, and they are cached in front.

4) Never

5) Files are written only by one other server, and there will be no parallel writes.

6) I pick consistency and as the half, availability.

7) This happened something like 15 years ago with MogileFS and thus scared us away. (Hence the single-server ZFS setup).

8) Reads are public, writes restricted to one other service that may write.

link

KaiserPro 866 days ago

GPFS is pretty sexy nowadays, although its really expensive: https://www.ibm.com/products/storage-scale

link

SheddingPattern 866 days ago

Sounds like you are talking from experience. Are you storage specialist, how did you learn so much about this?

link

KaiserPro 866 days ago

VFX engineer, I have suffered through:

_early_ lustre (its much better now)

GPFS

Gluster (fuck that)

clustered XFS (double fuck that)

Isilon

Nowadays, a single 2u server can realistically support 2x 100gig nics at full bore. So the biggest barrier is density. You can probably get 1pb in a rack now, and linking a bunch of jbods(well NVMEs) is probably easily to do now.

link

nh2 866 days ago

"1PB in a rack"? You can apparently already buy 2.5PB in a single 4U server:

https://www.techradar.com/pro/seagate-has-launched-a-massive...

link

KaiserPro 865 days ago

sorry I should have added a caveat of 1pb _at decent performance_

That seagate array will be fine for streaming (so long as you spread the data properly) as soon as you start mixing read/write loads on that, it'll start to chug. You can expect 70-150iops out of each drive, and thats a 60 drive array (from guess, you can get 72 drives in a 4u, but they are less maintainable, well used to be, thing might have improved recently)

When I was using luster with ultra scisi (yes, that long ago) we had good 10-20 racks to get to 100tb, that could sustain 1gigabyte a second.

link

nh2 865 days ago

Agreed, it depends on the use case. For some "more storage" is all that matters, for others you don't want to be bottlenecked on getting it into / out of the machine or through processing.

link