| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kbolino 247 days ago
	Lots of distributed, NoSQL databases work (or partially work) this way too (e.g., HBase rowkey, Accumulo row ID, Cassandra clustering key, DynamoDB sort key). They partition the data into shards based upon key ranges and then spread those shards across as many servers as possible. UUIDv7 is (by design) temporally clustered. Since many workloads place far more value on recent data, and all recent data is likely to end up in the same shard, you bottleneck on the throughput of a single server or, even with replication, a small number of servers.

2 comments

kbolino 246 days ago

FWIW it looks like Cassandra doesn't belong on this list, and DynamoDB only with qualifications.

Though Cassandra is more like quasi-SQL than NoSQL, the bigger issue is that actually the clustering key is never used for sharding. So Cassandra (today) always puts all data with the same partition key on the same shard, and the partition key is hashed, meaning there's no situation in which UUIDv7 would perform differently (better or worse) than UUIDv4.

In DynamoDB, it is possible for sort keys to be used for sharding, but only if there is a large number of distinct sort keys for the same partition key. Generally, you would be putting a UUID in the partition key and not the sort key, so UUIDv7 vs UUIDv4 typically has no impact on DB performance.

link

findjashua 247 days ago

i think the standard recommendation is to do range partitioning on the hash of the key, aka hash range partitioning (i know yugabyte supports this out of the box, i'd be surprised if others don't). this prevents the situation of all recent uuids ending up on the same shard.

link

kbolino 247 days ago

Indeed. In fact, Cassandra and DynamoDB have both hash keys and range keys; I've edited my comment to be more specific.

link