| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Akronymus 422 days ago
	My failure was misreading it as most common k rather than max k.

1 comments

senderista 422 days ago

Most common k is super-interesting because it can't be solved in one pass in constant space!

https://en.wikipedia.org/wiki/Streaming_algorithm#Frequent_e...

link

eru 422 days ago

What you are quoting solves a very different problem. It doesn't give you the most common k (in general).

link

3np 422 days ago

Why is that interesting? Intuitively a worst-case could be a stream of n-1 unique elements out of n with the duplicate at the end, so there is no way around O(n) space. Any element could be the most common so you must keep them all.

link

senderista 422 days ago

Sure, a similar trivial argument applies to the linear-space lower bound for set membership. But these linear lower bounds motivate the search for approximate techniques with sublinear lower bounds (although bloom filters or fingerprint tables are not actually sublinear).

link

eru 422 days ago

The algorithms on the Wikipedia page quoted actually solve a different problem. And they can do that in constant space.

So if someone tells you that one item in the stream is repeated so often that it occurs at least p% of the time (say 10%), then these algorithms can find such an element. But eg if they are multiple elements that occur more than p% of the time, they are not guaranteed to give you the one that occurs the most often. Nor are they guaranteed to give you any meaningful output, if the assumption is violated and no element occurs at least p% of the time.

link