Hacker News new | ask | show | jobs
by samstokes 4995 days ago
It looks like one of the things they're counting is clicks, so they could potentially have some pretty large datasets.

I don't know how well Mongo's map-reduce works, but in Postgres, COUNT(star) [1] does not perform well for very large tables (e.g. 100 million rows). You wouldn't want to be doing a COUNT(star) once per minute for each customer that had their dashboard open on a plasma screen.

Of course, there are other solutions to that problem: generate the counts on some more feasible schedule and cache them; have a read replica used for analytics queries; shard by customer and have no large customers.

I don't know whether their scale strictly requires the Redis solution, but in any case there are situations where it's not as simple as "throw it in Postgres and use an aggregate function".

[1] "star" instead of an asterisk to avoid HN thinking I'm trying to write in italics.

3 comments

Note that recent versions of Postgres can now COUNT against indices so no need to do a full table scan.
This certainly helps, but if your indices are several GB in size, even an index scan is a nontrivial expense.
To be fair, we may have over-engineered this solution. I've been doing some work with compound indexes with Mongo and they seem to be performing really well. Maybe I'll have to write a guest post for MongoHQ too ;)

Appreciate the time and thoughtful response.

The article said they could rebuild the count if they needed to. So something in regards to each click is being stored. If you were using postgres you'd just setup a trigger on that table to increment the click values (stored in another table) as appropriate. No aggregate function needed.
Certainly you can implement this same pattern without Redis. Triggers in Postgres would be a reasonable way to do it. I didn't say you can't do this in Postgres, I said you can't do it with COUNT().

It does indeed sound like they're storing every click: that's precisely why using aggregate functions would be expensive.

You absolutely can do it without COUNT. Just increment a counter value, the same way they're doing it in Redis.
Why was this downvoted? Adding 1 to any number field is just as an atomic operation as incrementation is in Redis.
Presumably it was downvoted (not by me) because it's refuting a claim I didn't make:

samstokes: I said you can't do it with COUNT()

guywithabike: You absolutely can do it without COUNT

Then we agree.