| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tmyklebu 284 days ago

Seems like the author is missing a trick?

You can use a simple hash table without probing to classify each number as either (1) first instance of a number in the table, (2) known duplicate of a number in the table, or (3) didn't fit in the table.

If you use a hash table with n buckets and one item per bucket and the input is random, then at least 63% of the distinct numbers in the input will fall into cases 1 or 2 in expectation. Increasing table size or bucket size improves this part of the math.

On really big inputs, it's probably still healthy to do a pass or two of radix sort so that the hash table accesses are cheap.