Hacker News new | ask | show | jobs
by honoredb 2929 days ago
To store 500 million hashes in 860MB, wouldn't you need to truncate them to 13 bits each? That'd give a false positive rate of 1000 in 1000, slightly worse than the Bloom filter's false positive rate of 1 in 1000.
1 comments

Depends on the data structure and encoding. For instance, another simple/fast/non-optimal encoding would be a 32 bit table of bit flags, which would weigh in at 0.5 GiB. That would give you a ~11.5% false positive rate.