Hacker News new | ask | show | jobs
by ozgune 4220 days ago
We'll update our FAQ with a detailed answer to this question.

As a summary, the user specifies the shard and replication count as they are sharding their table. For example, if you have 4 nodes, you may pick 256 as the initial shard count. That way, you'll have ample room to grow as you add new nodes to your cluster.

When you pick 256 shards over 4 worker nodes, pg_shard will create them in a round-robin fashion. Shard #1 will go to node A and B, shard #2 will go to node B and C, and so forth. This has the advantage that when one of the worker nodes fail, the remaining 3 nodes evenly take the additional work. Also, when you add a new node to the cluster, you can gradually rebalance some of the shards by moving them to the new node.

1 comments

We have used sharding for managing time series data e.g. 1 shard per day. Is there a way this could work i.e. where the number of shards continually grows?
That use case isn't yet handled by pg_shard: the plugin currently supports only hash partitioning and what you've described is range partitioning. This is certainly on our immediate feature list, as range and hash partitioning cover a variety of use cases.

However, CitusDB does support range partitioning and has a \stage command that will create new shards from incoming data. If you periodically load data that corresponds to a particular time range (hour, day, week), CitusDB can easily handle creation of additional shards during each load.