Hacker News new | ask | show | jobs
by bognition 3313 days ago
How do you concurrently update a counter?
2 comments

I think the main problem is not concurrently updating the counter. After all updating the HLL must also be done concurrently.

The primary motivation on their part for using HLL is given in the intro.

>In order to maintain an exact count in real time we would need to know whether or not a specific user visited the post before. To know that information, we would need to store the set of users who had previously visited each post, and then check that set every time we processed a new view on a post. A naive implementation of this solution would be to store the unique user set as a hash table in memory, with the post ID as the key.

>This approach works well for less trafficked posts, but is very difficult to scale once a post becomes popular and the number of viewers rapidly increases. Several popular posts have over one million unique viewers! On posts like these, it becomes extremely taxing on both memory and CPU to store all the IDs and do frequent lookups into the set to see if someone has already visited before.

Redis writes are atomic - you just use the increment function
Writes are atomic in redis because redis is single threaded. So you are bounded by how fast redis can write. If you try to write any faster then redis can handle you'll get queueing or errors.
The wonderful thing about HyperLogLogs is that you can split the counter in N servers and "merge" the registers later, in case you want an architecture that shards the same counter in multiple servers. But sharding directly by resource looks simpler actually...
Run enough redis servers to handle the load. Choose a server by hashing a user id. Total = sum of counts from all servers.
It's also got cassandra behind the scenes, which has fast, concurrent, distributed counters.