|
|
|
|
|
by __s
1067 days ago
|
|
What's advantage over having tenant id as distribution column? Seems like you make schema name the distribution column. Maybe gross setups where same name function definition varies between schemas (been there done that, don't want to do it again) Seems like article only offers ease of use. Guess I've never used microservices enough to consider that use case Couldn't the microservice case be handled by having distributed tables with no distribution column? ie today I'd create a distribution column & fill it with the same value on every row Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function |
|
Ease of use is definitely the main one. If you're willing to put in the work required to use tenant ID as a distribution column (add it to tables, primary keys, foreign keys, filters, joins), then it's a more scalable approach.
A challenge with sharding by tenant ID is that many applications use a normalized data model, meaning not all tables obviously have a tenant ID column. When you use a schema per tenant on vanilla PostgreSQL, no additional steps are typically required to enable schema-based sharding.
There are some other benefits of schema-based sharding such as custom table definitions, simpler & more versatile access control, and longer term we expect it will be easier to pin a large tenant to a node using schemas, or distribute the tables of ultra-large tenants (by some other dimension). Of course, row-based sharding has other benefits like parallel cross-tenant queries and global DDL.
> Couldn't the microservice case be handled by having distributed tables with no distribution column?
Absolutely. We first implemented the notion of single shard distributed tables with a NULL distribution column, and then built schema-based sharding on top as a convenience layer.
> Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function
Yes, public schema (or other another non-distributed schema) can have reference tables & distributed tables as usual, and tables in distributed schemas can have foreign keys / local joins with reference tables.