Hacker News new | ask | show | jobs
by xchaotic 3223 days ago
Then there's the 0 model for sharding - don't shard, just replicate everything, everywhere, with eventual consistency and MVCC
3 comments

Author of the post here, couldn't agree more that if you don't have to shard then don't. Scaling out using replicas is an option to certain scale, you do have to ensure no long running queries and make sure your replication is up to date. Those are another option of things to manage, and at an intermediate stage very viable ones, at certain scale it all changes a bit. All that said if you're at a small data level I wouldn't encourage sharding for the sake of it.. if you know you're going to have to scale beyond a single node or approaching limits where you're frequently scaling up it's good to know your options and plan ahead.
Nobody voluntarily shards for their own enjoyment (unless they like pain). They shard because they need to in order to scale.
Sharding and master to master replication are two different things.

The tradeoff of master to master replication limits you to CRDTs, append only logs and manual merging of conflicts by the end user. Alternatively if dataloss is an acceptable tradeoff you can also use a last write wins strategy.

Sharding is basically having one isolated "database" per X (User, location, etc) but the tradeoff is you can't have transactions across two databases.

Document databases usually do both. Each document is it's own tiny database with atomic updates which then can be distributed over the cluster and they support multi master replication for availability/automatic failover.

Or you can do multi master replication, where you write to all replicas synchronously (and deal with downtime of replicas by continuing if you manage to write to a majority of them).