| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by default-kramer 2952 days ago
	(The author is assuming that the primary key controls disk layout, which is usually true.) One advantage of using an incrementing integer is that rows will be ordered on disk based on when they were created. This often helps performance. If a query asks for 25 consecutive rows, there is a good chance they will all be on the same page. If you use UUIDs, then they could be on 25 different pages and you will have to do 25x the disk IO to handle the query.

2 comments

da_chicken 2952 days ago

> One advantage of using an incrementing integer is that rows will be ordered on disk based on when they were created.

Well, kind of. A lot of people think the auto incrementing integer function in many RDBMSs will always increase, or will never have gaps. It's likely but not guaranteed that n+k was created after n. If you really need to store the creation date, then you should store that in a datetime/timestamp column.

> If a query asks for 25 consecutive rows, there is a good chance they will all be on the same page. If you use UUIDs, then they could be on 25 different pages and you will have to do 25x the disk IO to handle the query.

This is true, but it also means that if you need to write 25 different rows, it will be in 25 different pages. That sounds bad because non-sequential writes are slower, but you have to remember that it could be 25 different connections trying to write! In other words, you create a hot spot with sequential inserts. If that's the end of the table, you'll have threads constantly waiting for other processes to do inserts since inserts lock the page being inserted.

So, yes, clustering on a UUID can cause problems (fragmented indexes, inefficient reads), but clustering on an autoincrement can also cause issues depending on your work load.

In reality, what you need to do (in the general case) is cluster on your business key even if it's not the primary key for your table.

link

nvivo 2952 days ago

> It's likely but not guaranteed that n+k was created after n.

This is true in mysql if you rollback a transaction, or use a INSERT INTO ... ON DUPLICATE KEY UPDATE.

In the first, the rollback doesn't revert the sequence, in the second the "insert part" will always increase the number, even if there is a duplicate to update.

link

da_chicken 2951 days ago

My point is that nothing stops you from modifying the value of an auto increment column, nor from inserting directly with a specific value. Yes, rollbacks don't roll back consumed values, but an auto increment column isn't immutable and the table isn't required to use the next value.

I've seen an application do things like an INSERT ROLLBACK SELECT LAST_INSERT_ID() to "reserve" IDs... or even perfectly acceptable things like reserving IDs out of a SEQUENCE. Those weren't all MySQL systems, but it did lead to confusion sometimes why gaps might appear or why timestamps might be "inconsistent".

The above one was a potential problem through 5.7 though, as it was possible to reuse some values since MySQL kept the auto increment value in memory only. INSERT followed by a ROLLBACK, then restart the server and you could get reused IDs. It's rare, but I've seen it. However, but it looks like they changed it with 8.0 to save the auto increment value to a system table now. That's a good thing.

link

zzzcpan 2952 days ago

> One advantage of using an incrementing integer is that rows will be ordered on disk based on when they were created

If each identifier starts with a logical time, say lamport timestamp, you can still get the same ordering effect without incrementing integers in a centralized place somewhere.

link

skrause 2952 days ago

I've written my own UUID generation function which uses the current timestamp with microsecond precision for the first 64 bits and a random value for the last 64 bits. So far it's been a great success.

Collisions are extremely unlikely unless you have Google scale and generated UUIDs are mostly ordered.

link

dvlsg 2952 days ago

I've also done this at a medium ish scale. We had to be careful with how the uuids were generated, though. Specifically which portion of the bytes contained the timestamp (and in what order) since different databases store uuids differently.

link

DrJosiah 2952 days ago

Have used this to great results in postgres, mysql, redis, etc.

link