Hacker News new | ask | show | jobs
by nzadrozny 4861 days ago
Full text search usually presumes an index, for a lot of functional differences compared to the browser's naive substring-matching Ctrl-F. And any proper search index is going to be a better user experience than naive string matches.

I haven't read through all of Lunr's docs and source, but based on my Solr/Elasticsearch experience, I'd expect to see (in time)…

Tokenization and (presumably) term normalization/analysis; a faster and smarter query language, for term order independence and boolean combinations of clauses; relevance scores and maybe even score boosting per field.

Better queryability really shouldn't be understated here. Just having term order independence focused on a specific set of JSON is going to be way better than naively matching any substring on the entire rendered page.

1 comments

That is almost exactly what lunr is doing. It tokenises the input text, stems the tokens and filters out any stop words. The index it can be searched, the order is not relevant, a prefix search is currently used so that you can find documents containing terms without having to type the whole term exactly. The matching documents are also scored as to how relevant they are to the search term.

In the future I want to add even more powerful querying, restricting search to specific fields, taking into account the distance between terms, and adding faceted search to reduce the total documents being searched over.

One of the original goals of the project was specifically to provide a better alternative to just using the browsers built in find-in-page functionality