Hacker News new | ask | show | jobs
by petergeoghegan 1962 days ago
NULL values are not special as far as deduplication is concerned. They use approximately as much disk space as a non-NULL integer column without deduplication, and compress just as well with deduplication. Deduplication is effective because it eliminates per-tuple overhead, so you see most of the benefits even with index tuples that naturally happen to have physically small keys. You'll still get up to a 3x decrease in storage overhead for the index provided there is low cardinality data (and not necessarily that low cardinality, ~10 or so tuples per distinct value will get you there).

The NULL issue is documented directly -- see the "Note" box here:

https://www.postgresql.org/docs/devel/btree-implementation.h...