| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BenMorganIO 2668 days ago

Given the title, I expected this to be about how PG improved their counting. This is not what it was about.

I remember working over half a billion records and having problems when I needed a count. I used count(id) but that was mainly from internet mantra. I did not see an improvement. Using Citus gave me a significant improvement from 7 minutes to 1. And that was just a single coordinator, two workers on the same host. It could become much much better.

If the data is very stagnant and writes are very low the triggers are great. Usually the "close enough" with pages is good if you have over 100k since paging - please correct me if I'm wrong - is sometimes 1k off.

My preference is Citus as a catch all, but a trigger, a Redis cache managed at the app level, or using page counts are all . really useful for stickier situations.

1 comments

groestl 2668 days ago

As an alternative to page counts, we used HLLs to estimate (unique) cardinality, and were quite happy with it. There is a postgres extension (postgresql-hll) and also a version for the JVM using the same algorithm/data format.

link