| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by barrkel 1885 days ago
	Another point: if there's any temporal locality to your future access patterns - if you're more likely to access multiple rows which were inserted at roughly the same time - then allocating sequential identifiers brings those entries closer together in the primary key index. I used to work on a reconciliation system which inserted all its results into the database. Only the most recent results were heavily queried, with a long tail of occasional lookups into older results. We never had a problem with primary key indexes (though this was in MySQL, which uses a clustered index on the primary key for row storage, so it's an even bigger benefit); the MD5 column used for identifying repeating data, on the other hand, would blow out the cache on large customers' instances.

1 comments

vinayan3 1879 days ago

To add on. If you are joining against a table where you are joining on a UUID the join becomes quite slow with very large tables, like >10 million rows.

PG will say it's doing a hash look up and you'd think it'd be fast but it will take quite sometime relative to joining two large tables with integer IDs. With UUIDS PG will give up doing a hash look up sometimes and try to do table scans unless you adjust random_page_cost.

In general joining on UUIDs for large tables is a bad idea. It can be great if you are joining a single row to another row.