Hacker News new | ask | show | jobs
by mjgp2 1688 days ago
Uuidv4 have worse performance when inserting into the btree for the primary key.
3 comments

That is true, but that has never been a bottleneck for me. We have a table with multiple hundreds of millions of rows, 20ish UUIDs indexes (yes it did get out of hand) and insert time isn't an issue.

Ofc, your app performance requirements might vary, and it is objectively true that UUIDs aren't ideal for btree indexes.

Then you are lucky. I once found an insert performance drop of sometimes 5x once a table reaches 500,000 elements or so with BTree's and Uuids in particular. Problem is: UUID's often scale well on the app side, especially with multiple writers so the app wants to use them.

Unless you are using a time sorted UUID, and you only do inserts into the table (never updates) avoid any feature that creates a BTree on those fields IMO. Given MVCC architecture of Postgres time sorted UUID's are often not enough if you do a lot of updates as these are really just inserts which again create randomness in the index. I've been in a project where to avoid a refactor (and given Postgres usage was convenient) they just decided to remove constraints and anything that creates a B-Tree index implicitly or explicitly.

It makes me wish Hash indexes could be used to create constraints. They often use less memory these days in my previous testing under new versions of Postgres, and scale a lot better despite less engineering effort in them. In other databases where updates happen in-place so as not to change order of rows (not Postgres MVCC) a BRIN like index on a time ordered UUID would be often fantastic for memory usage. ZHeap seems to have died.

Sadly this is something people should be aware of in advance. Otherwise it will probably bite you later when you have large write volumes, and therefore are most unable to enact changes to the DB when performance drastically goes down (e.g. large customer traffic). This is amplified because writes don't scale in Postgres/most SQL databases.

Why would you use a btree for them? Wouldn’t a hash index be ideal?
> Why would you use a btree for them?

1. because PRIMARY KEY is its own constraint, and the underlying index is not under you control

2. because PRIMARY KEY further restricts UNIQUE, and as of postgres 14 "only B-tree indexes can be declared unique"

Some databases don't give you a choice, and your table data is actually stored in a clustered index sorted by primary key. This means random inserts when using a UUID (without embedded time), and more page splitting.
Use sequential guids.