Hacker News new | ask | show | jobs
by ww520 3725 days ago
This is a non-issue. Most db engines just append new data rows at the end of the data table and assign an internal row ID for it. The keys (UUID in this case) are stored in the separate index pages using B+ tree, which searches random key (UUID) or sequentially incremented key equally well.

UUID key is a problem only if your main query is a range query on the PK of a clustered indexed table. If your main query is a range query, you should pick something that can be sorted in the range anyway.

3 comments

A lot of times when a UUID is used, it's not generated in the db, such as when you need to generate guaranteed unique IDs client side. Also 4 bytes vs 16 is not "equally well" in terms of performance. Note that the the diagram linked at the root of this thread is encoding uuids as strings which means 32-36 bytes instead of 16.
Actually only postgresql does it this way (by storing data in the heap and not in the primary index). Mysql(innodb),mssql,oracle uses the store the row in the primary-key.

Edit: I'm ~wrong, see below.

Only Innodb does it by default. The others use heap table by default.

MSSQL allows clustered indexed table as an option to order the physical storage of rows. Oracle has index-organized table as an option.

Edit: They don't use clustered indexed table by default because record insertion is very expensive since clustered index forces the table to store the records contiguously in the index's order. Also Innodb is not truely clustered indexed. It only stores records contiguously for one page at the B+ Tree leaf level. Records in different pages are scattered all over even if the index values are sequential.

It's pretty variable, but Postgres definitely is not the only one that does it that way (or even close).
Not the case with MS SQL Server for example (and there are probably several others). Physical row ordering (at least up to 2008 edition as far as I can recall) coincides with clustered index.