Hacker News new | ask | show | jobs
by RoyalSloth 857 days ago
Good job, this looks really well done. One thing I am wondering, however, is how easy it would be to integrate a pre-generated inverted index file into this fuzzy searcher?

Context: Say I have a bunch of blog posts from which I create an inverted index at "export to html" time, in order to avoid indexing the blog content at runtime on every page visit. Is there a way to persist the internal state of the fuzzy search across different page requests (e.g, /blog-post-1, /blog-post-2), so it only builds its internal state/index once?

The pre-generated inverted index could be quite large and I would like to avoid parsing it on every page request.

2 comments

Thank you for your kind feedback! That's a great idea. I have implemented a memento object that can be used for transferring the serialized state of the searcher. The intent of the implementation was to transfer the searcher between a web worker and the main thread. You could try to serve the Memento from the server and store it in an index db. You may have a look at the web worker example I provide in the repository.
> You could try to serve the Memento from the server and store it in an index db

I am not sure what you mean by "store it in an index db", but I was thinking about using the searcher on a static website (no real backend, only a fileserver serving pre-generated html files). So if I understand you correctly, in order for this to work I would have to cache Memento via a local storage and load it on every page load/search request.

Unfortunately the index would change over time, thus one would have to detect this somehow and regenerate Memento as well.

Sorry for the confusion. I think you could generate the memento each time you compile your blog into HTML. The memento can be stored as a json file and served statically by your fileserver. When a user visits your page, retrieve the memento from the server. Then, initialize the searcher with it. In this way you avoid indexing content at runtime.

As a bonus you could cache the memento in the local storage or session storage.

When i did my static site search function some time ago, I used Elasticlunr. I was able to pregenerate the index file as a big json file that is loaded at the client.

http://elasticlunr.com/