Hacker News new | ask | show | jobs
by surtyaar 5544 days ago
There are some good opensource implementations in python (and I am sure other languages).

https://github.com/jaybaird/python-bloomfilter - offers scalable bloom filters

https://github.com/axiak/pybloomfiltermmap - uses mmap

1 comments

We originally used mmap too - but it didn't work very well. First, Java has some rather serious mmap limitations, (http://bugs.sun.com/view_bug.do?bug_id=4724038) - and for some reason, we saw occcasional data corruption (which we haven't seen since we moved to the current system).
Have those mmap limitations really impacted you? And the corruption — I'm assuming that was with read/write mappings that you wrote to?

I'm asking, because we're getting great mileage out of mmap in Clojure, albeit for read-only mappings. And that's in a search engine :-) Using mmap for large data is great, because you avoid enlarging your heap and the garbage collector doesn't even have to care about your data.