Hacker News new | ask | show | jobs
by nargella 1248 days ago
Disclaimer: not a dba so my terms might not be appropriate

I’ve seen uuid4 which replaces the first 4 bytes with a timestamp. It was mentioned to me that this strategy allows postgres to write at the end of the index instead of arbitrarily on disk. I also presume it means it has some decent sorting.

[inspiration](https://github.com/tvondra/sequential-uuids/blob/master/sequ...)

3 comments

I use ULID, 128 bits, time and great sorting

https://github.com/ulid/spec

Wrote about my experience using ulids in Postgres if people are considering it: https://blog.lawrencejones.dev/ulid/
Good post thanks :)
Is there any way to have the database generate these automatically vs your application?
The common databases don't support natively support generating ULIDs to my knowledge. You can usually find extensions if you prefer generating them in the database instead of the application. I generate them in the application, and store them as a UUID in PostgreSQL to avoid needing any database extensions.
Yea, there are a few extensions for PG, in C and Go that give a ulid_create() function that can be used as column default, just like serial.
It also has the advantage that the page being written to, the right most leaf at the end of the index, is likely to always be available in the page cache. With random you may need to constantly go to disk to fetch the page.
In this sequential UUIDs idea, I wonder how big of a deal it is if the prefix part wraps around often? E.g. using a timestamp-based prefix with 2 bytes, if you increase the prefix every 60 seconds, the prefix will be reset every 45 days or so (60 * 1000 * 2^16) according to that README. Does it make sense to fine tune this value based on the use case or what?
Are there any clear downsides to sequential prefixes on UUIDs? I would imagine if you're producing new objects at a high enough rate, you'd have a lot of prefix collisions, which would hinder search times. I've never benchmarked to confirm that though.
If the prefix is incremented for every new ID, you essentially have the same problem as you do with serial: you leak information about the amount of rows in some timeframe.

As the link posted above mentions, you can alternatively use a timestamp-based prefix that wraps around after all the bits have been used. This one still leaks possible creation times of the record, so it's on par or better compared to UUIdv6, ULID, etc. (because here the exact creation time can't necessarily be deduced).

In all of these UUID solutions apart from the fully random v4, you are trading of the better index performance with some level of information leakage about the record the ID is associated with.