|
|
|
|
|
by ComputerGuru
757 days ago
|
|
Curious idea. So it’s for cases where you have any key but associated with one of only (preferably few) discrete values. I.E. your url example is great with url as a key but subpar if url were to be the value (padded to length n with trailing nulls encoded as a fixed width int array)? With its interesting set of guarantees, I can’t see a case where you could use this unless you are 100% positive all keys have previously been inserted into the set, otherwise you risk getting a wrong value in return (instead of no value). A traditional bloom filter is similar but in the worst case you throw away work because you look up the determinative data/value but here it’s a bit trickier. Lots of applications tolerate missing results but significantly fewer can tolerate “unknowingly incorrect” results. Question about the implementation: I would have expected the primary interface to be in-memory with some api for disk spillover for large datasets but while all the docs say “designed for in-memory lookups” the rust api shows that you need to provide it with a temp directory to create the structure? (Also, fyi, you use temp::temp_file() but never actually use the result, instead using the hard-coded /tmp path.) |
|
I guess another use case could be as any kind of "hint" where you need to do an authoritative lookup regardless of the filter lookup.
E.g., the file might be on this host, but you'll need to reach the host and check for the file either way, so if you go to the wrong host sometimes, it's not the end of the world.
That's something that's not possible with a bloom filter.
Seems like you could combine a shared static file and a host local cache to work around the errors as well (e.g., each host can cache whatever keys they've looked up that were wrong, but they can do LRU to get the best of both worlds (frequently accessed data is correct, while you can look up infrequent data with some chance of a miss).