|
|
|
|
|
by saalweachter
2523 days ago
|
|
A Bloom filter is just way overkill. If you have a list of 20 trillion query strings, and each query string is on average < 100 bytes, you're looking at a three line MapReduce and < 1 PiB of disk to create a table which has the frequency of every query ever issued. Add a counter to your final reduce to count how often the # times seen is 1. |
|
A bloom filter is the most appropriate data structure for this use-case. How is it overkill when it uses less space and is faster to query?