Hacker News new | ask | show | jobs
by mslot 1067 days ago
Nice to see this on HN :)

The high-level is: You enable a setting and every CREATE SCHEMA creates a new shard. All the tables in the schema will be co-located so you can have efficient joins & foreign keys between the tables.

On top of that, you can also have reference tables that are replicated to all nodes, again for fast joins & foreign keys with all schemas.

Everything else is about making every PostgreSQL feature work as seamlessly as if there was no sharding. You can still do things like transactions across schemas, create and use custom types, access controls, work with other extensions, use procedures, etc.

1 comments

Is there any way to have that automatically create n copies of the data across shards?

Something like a "min-copies 2" setting, which would ensure the shared data has at least one viable alternative in case of hardware failure (etc).

Historically, there has been, but we have found physical replication is ultimately a lot more performant, robust, and tweakable. Citus users & platforms generally use physical replication of each node for high availability.

For instance, the ability to quickly spin up a new replica using a disk snapshot is very useful and only feasible at the server level.

It is still possible to replicate shards (via the citus.shard_replication_factor setting), but it only helps for scaling read throughput, at the cost of lower write throughput.