| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ahachete 3 days ago

TBH I don't think it's that straightforward, I see it more of a notable architectural change. At a very high level, this means:

* Adding a sharding function, as you say.

* Developing an external service for metadata (shard placement) or alternatively have that metadata in one place and replicate (consistently!) to every query router.

* Implementing functions/catalogs for the users to understand the placement and configure/alter it.

* Implementing shard migration / rebalancing capabilities, possibly using Postgres logical replication (plus notable automation).

Here's one idea if you follow this path, something that Citus doesn't have: make the sharding function pluggable and pick one by default which is well-known and available in many languages (e.g. xxhash). If you do so, and guarantee stability of those functions, they could be used externally (applications) to route queries / inserts especially to the appropriate shard. While it makes application more complex, it may allow (combined with access to the metadata service) for faster ingestion paths (this is often known as application assisted sharding), and its not exclusive of the query routers.

Edit: formatting