| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by superboum 1591 days ago

(Garage Contributor here) We reviewed many of the existing solutions and none of them had the feature set we wanted. Compared to SeaweedFS, the main difference we introduce with Garage is that our nodes are not specialized, which lead to the following benefits:

- Garage is easier to deploy and to operate: you don't have to manage independent components like the filer, the volume manager, the master, etc. It also seems that a bucket must be pinned to a volume server on SeaweedFS. In Garage, all buckets are spread on the whole cluster. So you do not have to worry that your bucket fills one of your volume server.

- Garage works better in presence of crashes: I would be very interested by a deep analysis of Seaweed "automatic master failover". They use Raft, I suppose either by running an healthcheck every second which lead to data loss on a crash, or sending a request for each transaction, which creates a huge bottleneck in their design.

- Better scalability: because there is no special node, there is no bottlenecks. I suppose that with SeaweedFS, all the requests have to pass through the master. We do not have such limitations.

As a conclusion, we choose a radically different design with Garage. We plan to do a more in-depth comparison in the future, but even today, I can say that if we implement the same API, our radically different designs lead to radically different properties and trade-off.

3 comments

ddorian43 1591 days ago

> independent components like the filer, the volume manager, the master, etc.

You can run volume/master/filer in a single server (single command).

> filer probably needs an external rdbms to handle the metadata

This is true. You can use an external db. Or build/embed some other db inside it (think a distributed kv in golang that you embed inside to host the metadata).

> It also seems that a bucket must be pinned to a volume server on SeaweedFS.

This is not true. A bucket will be using it's own volumes, but can be and is distributed on the whole cluster by default.

> They use Raft, I suppose either by running an healthcheck every second which lead to data loss on a crash, or running for each transaction, which creates a huge bottleneck.

Raft is for synchronized writes. It's slow in the case of a single-write being slow because you have to wait for an "ok" from replicas, which is a good thing (compared to async-replication in, say, cassandra/dynamodb). Keep in mind that s3 also moved to synced replication. This is fixed by having more parallelism.

> Better scalability: because there is no special node, there is no bottlenecks. I suppose that SeaweedFS, all the requests have to pass through the master. We do not have such limitations.

Going to the master is only needed for writes, to get a unique id. This can be easily fixed with a plugin to say, generate twitter-snowflake-ids which are very efficient. For reads, you keep a cache in your client for the volume-to-server mapping so you can do reads directly from the server that has the data, or you can randomly query a server and it will handle everything underneath.

I'm pretty sure seaweedfs has very good fundamentals from researching all other open-source distributed object storage systems that exists.