Hacker News new | ask | show | jobs
by larzang 2023 days ago
Postgres stores records by primary id sequence. If you have completely random IDs, records inserted at the same time are fragmented across the entire set, which makes accessing them slower if they're meant to be sequential.

Whether this matters entirely depends on the type of data you're storing. If you're storing unrelated data which is always random access anyway, non-sequential ids don't really matter. If you're storing related data which is mostly accessed sequentially, e.g. Impressions filtered by a creation date range, it's definitely better to have sequential ids for non-fragmented storage. UUIDv6 isn't as commonplace in libraries as v1 or v4 but offers a good compromise for this.

3 comments

No, Mysql does. Postgres doesn't.

(edit, at least, some common mysql table types do)

(further edit -- You don't actually need a primary key in postgres. You might want one, and depending on how you do it, it's generally good design, but there's no actual requirement for them.)

No, I do not think this is a good explanation. What you describe is how InnoDB works.

PostgreSQL stores the actual records separately from the primary key index so the actual records won't fragment but the primary key index will. The records themselves will roughly be stored in insertion order. A random primary key in PostgreSQL mainly means that inserts into the primary key index will be more expensive and that it will bloat the primary key index to some extent.

Note that this can actually be beneficial if you are sharding the table as now new inserts will be put on different shards. (Of course ideally you would be inserting sequentially but on different shards)