| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by magicpointer 1885 days ago
	About UUID as Primary Key and performance, the following article has some insights and benchmarks as well: https://www.2ndquadrant.com/en/blog/sequential-uuid-generato... Essentially, they observed sizeable performance improvements by using UUID generators that are tweaked to get more sequentia resultsl. It results in better indexes. The articles compares sequences, random UUIDs and 2 kinds of sequentialish UUID generators.

4 comments

tehlike 1885 days ago

Mentioned this in a sibling comment:

There's another benefit to UUID - You can generate them anywhere including application side. Doing this on application side would have tremendous batching benefits or inserting objects with relationships at the same time (Vs waiting first insert to return an ID to be used in the FK).

inopinatus 1885 days ago

Caveat programmer: this could be problematic, not in the sense it doesn't work, but in the sense that someone working on backend code may have a preconceived expectation that UUIDs are also effectively a keyspace i.e. they're hard to guess. The validity of that is already challenged by variants defining temporal or logical order, and evaporates completely if you let clients declare their own that you accept at face value. Applications may have potentially guessable/gameable object identifiers sloshing around inside as a consequence, which is modestly ironic given that one benefit many folks expect from adopting UUIDs in the first place is hardening up the attack surface of trivially enumerable sequences.

There are a few mitigations but my favourite is the "casino chips" approach: pregenerate them server side, and allocate to clients on demand, including en masse if need be ("here kid, have a few million UUIDs to get you started"). Verify with whatever simple signature scheme comes with your application server framework, or at small scale just toss them in a crude LRU store.

Or, remember where the UUID came from, and apply their organisational scope to any trust you place upon it. This might work particularly for multi-tenanted SaaS. However it requires that all usage is tenant-bounded end-through-end throughout your application. This may be in conflict with a) your framework, b) your zenlike contemplation of simplicity in data management, or c) programmers in a hurry forgetting to scope their queries properly.

Ultimately, relying on UUIDs as intrinsically unguessable security tokens is probably not a great idea, but it's one that remains thoroughly embedded in the programming zeitgeist. As ever, nothing useful comes without a compromise. Keep your eyes open to the systemic consequences of design choices, and don't leave traps for your fellow developers.

NicoJuicy 1885 days ago

He's not saying clients can create their own ids. The applications can.

The concepts he's talking about are required for cqrs. Which is a popular concept applied with mostly DDD or microservices.

lazide 1885 days ago

There definitely are people out there in this thread proposing clients be able provide UUIDs. I’ve seen it elsewhere too.

I’ve also personally experienced UUID collisions due to badly set up VM environments under Windows. It isn’t a good idea to blindly trust any value - and that includes supposedly ‘never collide’ id’s like UUID.

For what it’s worth, I also had the joy of debugging someone’s distributed hash table that was using md5 as the hash bucket key (this was... 2 decades ago?) and had no way to handle collisions because obviously that is impossible.

NicoJuicy 1884 days ago

This seems more an issue of the libraries random generator to form uuids.

Eg. I use guids ( .net) and i have never seen an issue.

inopinatus 1885 days ago

I getcha, but these days the ambit reach of "application" extends to Javascript executing client-side in an environment that's basically overrun with lions/tigers/bears, and I'll suggest that's particularly a consideration when the front-end is a SPA participating in a CQRS/event-sourced overall application architecture.

tomnipotent 1885 days ago

For perspective, the npm uuid package is now being downloaded ~50M/week. It's usage is ubiquitous at this point, on any platform JS is running.

https://www.npmjs.com/browse/depended/uuid

NicoJuicy 1877 days ago

Little bit later to reply.

Unfortunately that doesn't mean much.

Since nodejs is a server side language and can handle that package too. And it's not "solely" for js/spa's.

kortex 1885 days ago

Should you ever use a plain token (where you just check if it exists in some authed_users table) vs, I dunno, some sort of signed/HMAC type thing, where you have to call some function on it? I genuinely don't know but I know enough to generally leave authentication up to those that do know.

Maybe I'm just thinking of OAuth where there are multiple hops involved?

inopinatus 1885 days ago

The comparison to OAuth is quite reasonable. Perhaps the most obvious parallel is the use of a state parameter during the three-legged exchange, without which it's exposed to a CSRF clickjacking attack.

kortex 1885 days ago

Right. Maybe it's paranoid, but it seems like a bearer token has potential avenues for forgery (CSRF or others), replay attacks, add-on jacking, etc. Also harder to coordinate with distributed apps. I think the Captain Tightpants approach would be to initialize some client-side private key/cert, use that to sign each request and verify based on that cert.

That should also make it easier to, say, verify and unwrap the request at the gateway to the server, before sending it to the rest of the application-proper.

ccleve 1884 days ago

In Java there is a UUID generator based on SecureRandom. That's about as unguessable as you're going to get.

squiggleblaz 1884 days ago

It's not a question of whether UUIDs can be generated unguessably. They can be, as you point out.

It's whether the UUIDs in your system can be reliably presumed to be unguessable - including the UUIDs that were generated by code which was written after you wrote your query that assumes unguessability.

Today you might say "Oh, this SecureRandom-based UUID generator is unguessable and meets all of our requirements". Tomorrow you might say "Ah, this SecureRandom-based UUID generator is too slow, let's generate them in our Android app instead of on the server". But now the UUIDs stored in your database aren't reliably unguessable, because you accept whatever your client API tells you without verification. How plausible is it, within the timeframes you actually get, to review every query for whether it assumes the trustworthiness of the UUID generation? Better to assume UUIDs have some convenient properties, than to assume that they're unguessable just because the API is cryptographically secure today.

Benjammer 1885 days ago

I think you can probably get the same "batching benefits" if you use a global ID generation service of some sort, with sequential IDs to improve indexing. Using sequential IDs doesn't necessitate using auto-generated sequential IDs.

tehlike 1885 days ago

correct, but global service need local caching. Some "HiLo" type identity generation could work. Sequential-UUID simplifies it as you don't need that service, so it's preferable.

HiLo explanation below:

A client simply gets a range of ids to be used, exclusive to them. Then can use it without any roundtrip to server.

This can be achieved with a "hi" stored on service, and a batchSize that's constant in the system.

Each time a "hi" is requested, it's incremented by 1. Now the client can generate (hi * batchsize, (hi+1) * batchsize - 1).

ccleve 1884 days ago

The Postgres JDBC driver does not guarantee that batch inserts come back in the same order that you insert them (when you use RETURNING *). So, if you generate UUIDs server-side, you can't conveniently match them up with the records you just inserted. You're better off generating them in the app server first and then sending them to Postgres.

ccleve 1884 days ago

Plus, it's way better to generate a UUID in the app server and send it to the server as a string. The reason is that you can deal with the id as a plain string and don't have to deal with a non-standard datatype in your app.

zepolen 1885 days ago

Another benefit to UUIDs is merging tables/databases without key conflict.

berkes 1884 days ago

Another benefit is that clients can generate and store objects before sending them to the server.

Allowing simple caching, easy async, easy handling of offline, or far simpler clientside code for dealing with those objects. etc.

For mobile app development which relies on an online (http) backend, clientside generatable UUIDs offer almost only benefits.

megous 1885 days ago

There's nothing that prevents you from fetching a batch of N IDs from the server. Server just does +N on the sequence and you can do whatever you like with the IDs client side.

You can also use one sequence for everything on the server, and then you can also pre-create ID based relationships client side.

tehlike 1884 days ago

Made the same claim here: https://news.ycombinator.com/item?id=27347243

jeltz 1885 days ago

You can just use PostgreSQL's writeable CTEs to get the same batching benefits plus the benefits from using serials. So, no, I do not think batching is a good reason for using UUIDs.

tehlike 1883 days ago

Writable ctes sound like a very niche feature I haven't used and don't plan to use. It feels like a good reason to me.

hermanradtke 1885 days ago

I use a ulid[1] as a uuidv4 replacement:

https://github.com/ulid/spec

j-pb 1885 days ago

They almost got it right, a better implementation would overflow regularly to make use of the entire key space, and counter untuitively more resistant to overflows.

Clocks aren't reliable enough for timestamps anyways so garbage collection is the only thing you kinda wanna rely on them for.

A good sweet spot seems to be, 32bit milliseconds + 96bit of entropy. This overflows appeoximately every 50 days, allowing for 50 day rolling data retention.

simondotau 1885 days ago

Not the worst idea—50 days is a nice sweet spot between infrequent enough to have some indexing benefit and frequent enough that potential downsides will be discovered early in the product’s life cycle.

Personally I wouldn’t do this. A scenario where for each individual millisecond of elapsed time, 96 bits of entropy is an upgrade over 80 bits of entropy, is fairly extreme. I don't think there are many databases in the world which would ever need more collision mitigation than that.

j-pb 1884 days ago

> I don't think there are many databases in the world which would ever need more collision mitigation than that.

Individual instances? Maybe not. But for those an autoincrement key would also work. That is not the scenario that ULIDs and GUIDs are advertised for.

The goal is to have an universally/globally unique ID. So whenever you encounter two IDs you can be (resonably, probability wise) sure that they won't collide.

Any such sheme thus must, by definition, serve every single use case now and forever everywhere. That's a tough one.

Also it's not really 80bit vs 96bit (which due to the birthday paradox is already a huge difference) but more 80bits vs. 128bit as the timestamp is recycled with sufficient usage.

I'm actually concerned that 96bit isn't enough, as it relies on the assumption that you'll use this scheme for for data spanning years, in order to properly use the timestamp as entropy.

fatsdomino001 1885 days ago

I was debating between using ULID vs just using the same sequential bigint sharding strategy that Instagram uses[1][2].

I ended up deciding to use sharded bigints because it enables better indexing, even though there are drawbacks when first writing the instances; The benefit of faster search was more important for me.

[1]: https://instagram-engineering.com/sharding-ids-at-instagram-...

[2]: http://www.livinginthepast.org/2016/02/24/sharding-into-bigi...

FractalHQ 1884 days ago

Would you recommend this over https://github.com/ericelliott/cuid ? I use cuid with Postgres and it’s worked out great.

aidos 1885 days ago

They seem like what I'm looking for. I was looking at moving to something more random because our current UUIDs (uuid1) are too easy to mistake for one another at a glance.

edoceo 1885 days ago

I was hoping someone would mention these. Its really quite nice. 128bit, sortable, client or server generated, no collision (well almost zero: 80bits are random per millisecond)

Matrixik 1885 days ago

There is also CUID:

https://github.com/ericelliott/cuid

Scarbutt 1885 days ago

Careful with leaking sensitive information with semi-sequential UUIDs though.

magicpointer 1885 days ago

To alleviate the issue of having a sequential part, they make it wrap around so that you cannot tell the order between two UUIDs. It's already some protection, and the random part is still large.