| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fabian2k 1340 days ago

Could be the deduplication newer Postgres versions have for B-Tree indexes:

https://www.postgresql.org/docs/current/btree-implementation...

> 67.4.3. Deduplication

> A duplicate is a leaf page tuple (a tuple that points to a table row) where all indexed key columns have values that match corresponding column values from at least one other leaf page tuple in the same index. Duplicate tuples are quite common in practice. B-Tree indexes can use a special, space-efficient representation for duplicates when an optional technique is enabled: deduplication.

> Deduplication works by periodically merging groups of duplicate tuples together, forming a single posting list tuple for each group. The column key value(s) only appear once in this representation. This is followed by a sorted array of TIDs that point to rows in the table. This significantly reduces the storage size of indexes where each value (or each distinct combination of column values) appears several times on average. The latency of queries can be reduced significantly. Overall query throughput may increase significantly. The overhead of routine index vacuuming may also be reduced significantly.

1 comments

Sirupsen 1340 days ago

Excellent, thank you! I'll add that to the article.

link

fabian2k 1340 days ago

This is a guess on my part, though I think a plausible one. To verify you'd probably have to compare an index with unique values to one with many identical ones.

link