Hacker News new | ask | show | jobs
by __s 1067 days ago
What's advantage over having tenant id as distribution column? Seems like you make schema name the distribution column. Maybe gross setups where same name function definition varies between schemas (been there done that, don't want to do it again)

Seems like article only offers ease of use. Guess I've never used microservices enough to consider that use case

Couldn't the microservice case be handled by having distributed tables with no distribution column? ie today I'd create a distribution column & fill it with the same value on every row

Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function

2 comments

> What's advantage over having tenant id as distribution column?

Ease of use is definitely the main one. If you're willing to put in the work required to use tenant ID as a distribution column (add it to tables, primary keys, foreign keys, filters, joins), then it's a more scalable approach.

A challenge with sharding by tenant ID is that many applications use a normalized data model, meaning not all tables obviously have a tenant ID column. When you use a schema per tenant on vanilla PostgreSQL, no additional steps are typically required to enable schema-based sharding.

There are some other benefits of schema-based sharding such as custom table definitions, simpler & more versatile access control, and longer term we expect it will be easier to pin a large tenant to a node using schemas, or distribute the tables of ultra-large tenants (by some other dimension). Of course, row-based sharding has other benefits like parallel cross-tenant queries and global DDL.

> Couldn't the microservice case be handled by having distributed tables with no distribution column?

Absolutely. We first implemented the notion of single shard distributed tables with a NULL distribution column, and then built schema-based sharding on top as a convenience layer.

> Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function

Yes, public schema (or other another non-distributed schema) can have reference tables & distributed tables as usual, and tables in distributed schemas can have foreign keys / local joins with reference tables.

One reasons why some prefer multi tenancy via schemas instead of a tent ant column: Reading a column with the wrong tenant id happens easily, just forget the where in a query. Across schemas, this is much harder to get wrong as a schema has to be explicitly named in a query.