Hacker News new | ask | show | jobs
by jimmaswell 3315 days ago
Why can't they just associate a list of viewed posts with each user, or list of users that viewed a post with each post, and check that? I don't get why this needs any consideration.
2 comments

They addressed your second point in the article. On a popular post, you would be storing several megabytes of data to capture/relate each unique user that visited. That gets expensive at scale. HLL takes then down to a few kilobytes, less than 1% of the original size.

For your first suggestion, you would have to do a very expensive look up. You couldn't cache it effectively​ due to the requirement of near real time stats. You could improve look up time using columnar storage, but the performance and memory usage will be nowhere near as nice as with HLL.

Problems are harder at scale.

I've had a "phases of computing" article percolating for a while to this end. Problems aren't just harder at scale, but they actively change their observable properties because of the stressors involved and where they crop up.
Have you stopped to think how many users that is and how many posts?

Viewing a single thread could require five hundred associations.

And it already requires reading five hundred comments.