| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by olivernn 3256 days ago

Lunr.js [1] used to use a trie data structure and memory usage was a concern. The problem is that although the data structure compresses prefixes, it does nothing for suffixes. Depending on the corpus this can lead to a lot of duplication.

Switching to a trie-like structure that compresses prefixes and suffixes can lead to significant savings. Building the structure can be a bit more burdensome, so there is a trade off there. There is a paper describing the approach [2] and, if you're interested, my JavaScript implementation [3][4].

[1] https://lunrjs.com

[2] http://www.aclweb.org/anthology/J00-1002.pdf

[3] https://github.com/olivernn/lunr.js/blob/master/lib/token_se...

[4] https://github.com/olivernn/lunr.js/blob/master/lib/token_se...

1 comments

gandreani 3256 days ago

Cheers! Thanks for sharing. EDIT: I like the site design too!

link