Hacker News new | ask | show | jobs
by Sirupsen 1292 days ago
I've added this to the article, thanks!

Composite index (int64, int64): ~70 MiB in Postgres, ~350 MiB in MySQL

Single index (int64): ~70 MiB in Postgres, ~240 MiB in MySQL

If you assume the majority of an index are index entries of (int64, int64, int64) where the third number is some sort of identifier for the record on the heap, you'd expect this to be ~230 MiB. So Postgres does some compression very well here, and MySQL has a bit more overhead for its indexes it seems.

3 comments

Could be the deduplication newer Postgres versions have for B-Tree indexes:

https://www.postgresql.org/docs/current/btree-implementation...

> 67.4.3. Deduplication

> A duplicate is a leaf page tuple (a tuple that points to a table row) where all indexed key columns have values that match corresponding column values from at least one other leaf page tuple in the same index. Duplicate tuples are quite common in practice. B-Tree indexes can use a special, space-efficient representation for duplicates when an optional technique is enabled: deduplication.

> Deduplication works by periodically merging groups of duplicate tuples together, forming a single posting list tuple for each group. The column key value(s) only appear once in this representation. This is followed by a sorted array of TIDs that point to rows in the table. This significantly reduces the storage size of indexes where each value (or each distinct combination of column values) appears several times on average. The latency of queries can be reduced significantly. Overall query throughput may increase significantly. The overhead of routine index vacuuming may also be reduced significantly.

Excellent, thank you! I'll add that to the article.
This is a guess on my part, though I think a plausible one. To verify you'd probably have to compare an index with unique values to one with many identical ones.
An other thing which might be of interest: what if you use convering indexes for the two single indexes? Is postgres able to do an index-only scan then, thanks to the coverage?

Or does it decide to use only one index and filter based on the covered values?

As you mention in the article, this is small by modern hardware standards and means the indexes can be entirely held in memory. Curious how things look for much larger tables where the index has to be read from disk?