| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by afgrant 1016 days ago
	Are there modern databases that can’t readily index on a 128-bit value?

5 comments

dvlsg 1016 days ago

I don't think it's a matter of not being able to index on it, but that mostly random UUIDs (like v4) can lead to some interesting index fragmentation you have to stay on top of somehow.

link

konschubert 1016 days ago

If you use sequential ids, the IDs most in use are the higher ones. So your index is “hot at the top” and you can keep that part in memory.

This is only a problem with extremely big indexes.

link

WorldMaker 1014 days ago

Sort orders are complicated for UUIDs because of the interesting defined structure to them, and of course endian issues.

One fun and useful reference: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=10...

It's particularly interesting that Microsoft SQL Server was designed to optimize indexing for UUIDv1 machine IDs. That makes a certain amount of sense for a database cluster if IDs are sorted by machine. Of course, developers don't let developers use UUIDv1 in 2023 because those machine IDs are not secure in a general sense and can be a privacy/data leak in the worst cases.

Other databases sort/index UUIDs differently. There's no real "standard" and optimizing the storage of a UUID key is a game of playing to the strengths of your specific database.

(On one project I put some work into matching the much-better-defined ULID sort order to MS SQL Server uniqueidentifier columns for better database locality.)

link

safetytrick 1016 days ago

Random ids require random inserts and that is bad for performance. It isn't a deal-breaker in most situations but it is a real cost that you do pay.

link

dgb23 1015 days ago

You already got multiple answers explaining the performance issue.

Now in many, typical applications you'd have to scale up quite significantly before this becomes a problem.

But if your application requires you to store small, new entries very quickly, you'll start to notice this even with moderate scale. Disk persistence is often the bottleneck already, and this might make it worse.

link