|
|
|
|
|
by bjt
4208 days ago
|
|
At work we're in the midst of rolling out a sharded Postgres platform based on http://www.craigkerstiens.com/2012/11/30/sharding-your-datab..., with the sharding implemented at the application level. The biggest piece of complexity in that post is around designing the sharding in such a way that you can gracefully add more shards later. Having read the pg_shard readme, it's not clear to me how it addresses that issue. I'd need to have a really clear idea how to handle scaling my cluster before committing to a sharding solution. |
|
As a summary, the user specifies the shard and replication count as they are sharding their table. For example, if you have 4 nodes, you may pick 256 as the initial shard count. That way, you'll have ample room to grow as you add new nodes to your cluster.
When you pick 256 shards over 4 worker nodes, pg_shard will create them in a round-robin fashion. Shard #1 will go to node A and B, shard #2 will go to node B and C, and so forth. This has the advantage that when one of the worker nodes fail, the remaining 3 nodes evenly take the additional work. Also, when you add a new node to the cluster, you can gradually rebalance some of the shards by moving them to the new node.