Hacker News new | ask | show | jobs
by da_chicken 2950 days ago
> One advantage of using an incrementing integer is that rows will be ordered on disk based on when they were created.

Well, kind of. A lot of people think the auto incrementing integer function in many RDBMSs will always increase, or will never have gaps. It's likely but not guaranteed that n+k was created after n. If you really need to store the creation date, then you should store that in a datetime/timestamp column.

> If a query asks for 25 consecutive rows, there is a good chance they will all be on the same page. If you use UUIDs, then they could be on 25 different pages and you will have to do 25x the disk IO to handle the query.

This is true, but it also means that if you need to write 25 different rows, it will be in 25 different pages. That sounds bad because non-sequential writes are slower, but you have to remember that it could be 25 different connections trying to write! In other words, you create a hot spot with sequential inserts. If that's the end of the table, you'll have threads constantly waiting for other processes to do inserts since inserts lock the page being inserted.

So, yes, clustering on a UUID can cause problems (fragmented indexes, inefficient reads), but clustering on an autoincrement can also cause issues depending on your work load.

In reality, what you need to do (in the general case) is cluster on your business key even if it's not the primary key for your table.

1 comments

> It's likely but not guaranteed that n+k was created after n.

This is true in mysql if you rollback a transaction, or use a INSERT INTO ... ON DUPLICATE KEY UPDATE.

In the first, the rollback doesn't revert the sequence, in the second the "insert part" will always increase the number, even if there is a duplicate to update.

My point is that nothing stops you from modifying the value of an auto increment column, nor from inserting directly with a specific value. Yes, rollbacks don't roll back consumed values, but an auto increment column isn't immutable and the table isn't required to use the next value.

I've seen an application do things like an INSERT ROLLBACK SELECT LAST_INSERT_ID() to "reserve" IDs... or even perfectly acceptable things like reserving IDs out of a SEQUENCE. Those weren't all MySQL systems, but it did lead to confusion sometimes why gaps might appear or why timestamps might be "inconsistent".

The above one was a potential problem through 5.7 though, as it was possible to reuse some values since MySQL kept the auto increment value in memory only. INSERT followed by a ROLLBACK, then restart the server and you could get reused IDs. It's rare, but I've seen it. However, but it looks like they changed it with 8.0 to save the auto increment value to a system table now. That's a good thing.