|
|
|
|
|
by hedgehog
4570 days ago
|
|
Very nice. You will save some space and allow for some typos if you stem and soundex before insertion. Also you can save space and improve the run time somewhat if rather than many separate bloom filters you build one large one where each item is post ID + word. If you do that you can also insert each word bare so you get O(1) empty result sets, helpful if you're updating the results with every keystroke in a search box. |
|
EDIT: Hmm, turns out it's pretty much the same size, which makes sense, I guess: http://nbviewer.ipython.org/gist/skorokithakis/0abbfebced25f...