Hacker News new | ask | show | jobs
by amitparikh 4145 days ago
I agree. Additionally, the words "distinct" or "cardinality" don't appear anywhere in this article, which is a major omission when discussing HyperLogLog. The primary use of the algorithm is to provide a cardinality estimate.
2 comments

"Let's say we have a whiteboard and we have to count how many individuals visit our home. If our neighbor visits 5 times, they count as one unique visitor. If a friend comes over once, that's another unique visitor. So far, we have 2 unique visitors."

They may not specifically use the word "distinct" (opting for phrases like "unique visitor" instead), but they certainly make the concept clear.

Agreed, cardinality is the key word: it goes with how his example is inappropriate -- if you were just counting the number of people coming in a simple addition (and maybe take a log() ) would cost just as much. I believe the usefulness of HLL is evaluating the cardinality of sets you need to access -- you "sample" the set and get a quick estimate of the cardinality of certain objects.
I disagree. I think that the article does an EXCELLENT job of describing it. They avoided the term "cardinality" on purpose, since the target audience was people who might well be scared off by fancy words they weren't familiar with.

And the problem posed was not to count the number of visitors, but to count the number of different people who visited... the fancy word for that is the "cardinality of unique visitors" but "number of different people" is just as accurate.

Counting the number of people could NOT be done with the same amount of space. This example required two simple counters on the blackboard... call it 30 bits of memory, assuming you didn't expect either counter to get more than 8. That's just about exactly enough space to store ONE social security number... and nowhere near enough to count the unique people.