Hacker News new | ask | show | jobs
by josephg 855 days ago
> As I discussed in an earlier post[3] when you use Postgres native UUID v4 type instead of bigserial table size grows by 25% and insert rate drops to 25% of bigserial. This is a big difference.

Does anyone know why UUIDv4 is so much worse than bigserial? UUIDs are just 128 bit numbers. Are they super expensive to generate or something? Whats going on here?

4 comments

UUIDv4s are fully random, and btree indices expect "right-leaning" values with a sensible ordering. This makes indexing operations on UUIDv4 columns slow, and was the motivation for the development of UUIDv6 and UUIDv7.
I'm curious to learn more about this heuristic and how the database leverages it for indexing. What does right-leaning mean formally and what does analysis of the data structure look like in that context? Do variants like B+ or B* have the same charactersistics?
The 25% increase in size is true but it's 8 bytes, a small and predictable linear increase per row. Compared to the rest of the data in the row, it's not much to worry about.

The bigger issue is insert rate. Your insert rate is limited by the amount of available RAM in the case of UUIDs. That's not the case for auto-incrementing integers! Integers are correlated with time while UUID4s are random - so they have fundamentally different performance characteristics at scale.

The author cites 25% but I'd caution every reader to take this with a giant grain of salt. At the beginning, for small tables < a few million rows, the insert penalty is almost negligible. If you did benchmarks here, you might conclude there's no practical difference.

As your table grows, specifically as the size of the btree index starts reaching the limits of available memory, postgres can no longer handle the UUID btree entirely in memory and has to resort to swapping pages to disk. An auto-integer type won't have this problem since rows close in time will use the same index page thus doesn't need to hit disk at all under the same load.

Once you reach this scale, The difference in speed is orders of magnitude. It's NOT a steady 25% performance penalty, it's a 25x performance cliff. And the only solution (aside from a schema migration) is to buy more RAM.

I think its because of btrees. Btrees and the pages work better if only the last page is getting lots of writes. Iuids cause lots of un ordered writes leading to page bloat.
Random distribution in the sort order mean the cache locality of a btree is poor - instead of inserts going to the last page, they go all over the place. Locality of batch inserts is also then bad at retrieval time, where related records are looked up randomly later.

So you pay taxes at both insert time and later during selection.