Hacker News new | ask | show | jobs
by Akronymus 422 days ago
My failure was misreading it as most common k rather than max k.
1 comments

Most common k is super-interesting because it can't be solved in one pass in constant space!

https://en.wikipedia.org/wiki/Streaming_algorithm#Frequent_e...

What you are quoting solves a very different problem. It doesn't give you the most common k (in general).
Why is that interesting? Intuitively a worst-case could be a stream of n-1 unique elements out of n with the duplicate at the end, so there is no way around O(n) space. Any element could be the most common so you must keep them all.
Sure, a similar trivial argument applies to the linear-space lower bound for set membership. But these linear lower bounds motivate the search for approximate techniques with sublinear lower bounds (although bloom filters or fingerprint tables are not actually sublinear).
The algorithms on the Wikipedia page quoted actually solve a different problem. And they can do that in constant space.

So if someone tells you that one item in the stream is repeated so often that it occurs at least p% of the time (say 10%), then these algorithms can find such an element. But eg if they are multiple elements that occur more than p% of the time, they are not guaranteed to give you the one that occurs the most often. Nor are they guaranteed to give you any meaningful output, if the assumption is violated and no element occurs at least p% of the time.