Hacker News new | ask | show | jobs
by srcreigh 698 days ago
Easy trick to making joins 50x faster: don't use Postgres and give your tables a primary key which groups related items together.

A lot of people don’t know that a database index doesn’t order the actual rows on disk. It’s just a Btree of pointers.

If you use clustered index for a table query pattern, the rows are actually ordered on disk.

Most DBs load data in 8KiB chunks. So if you query 100 rows that are 100bytes, if they’re not sorted, you actually need to load nearly 1MiB of data even tho the query result is 10KiB.

Speeds up joins and range queries 50x or more, less cache evictions, etc.

You can do this in any database except for Postgres. Postgres doesn’t have the ability to keep rows sorted on disk.

2 comments

Although it isn't automatic, doesn't the Postgres CLUSTER command reorder the rows on disk? Or am I misunderstanding something?
It does, but it's a bit of a problem when table is large and gets new rows since by default it locks the table and is a slow operation.

Oh and also does compression which helps quite a bit with network storages (like cloud disks)

Cockroachdb or yugabyte kind of solves for sortedness by pk since it uses rocksdb variants/lsm tree underneath.