| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by srcreigh 698 days ago

Easy trick to making joins 50x faster: don't use Postgres and give your tables a primary key which groups related items together.

A lot of people don’t know that a database index doesn’t order the actual rows on disk. It’s just a Btree of pointers.

If you use clustered index for a table query pattern, the rows are actually ordered on disk.

Most DBs load data in 8KiB chunks. So if you query 100 rows that are 100bytes, if they’re not sorted, you actually need to load nearly 1MiB of data even tho the query result is 10KiB.

Speeds up joins and range queries 50x or more, less cache evictions, etc.

You can do this in any database except for Postgres. Postgres doesn’t have the ability to keep rows sorted on disk.

2 comments

mthoms 698 days ago

Although it isn't automatic, doesn't the Postgres CLUSTER command reorder the rows on disk? Or am I misunderstanding something?

link

tehlike 698 days ago

It does, but it's a bit of a problem when table is large and gets new rows since by default it locks the table and is a slow operation.

Oh and also does compression which helps quite a bit with network storages (like cloud disks)

link

tehlike 698 days ago

Cockroachdb or yugabyte kind of solves for sortedness by pk since it uses rocksdb variants/lsm tree underneath.

link