Hacker News new | ask | show | jobs
by benrules2 939 days ago
What sort of BM25 indices does this support? Does it allow experimenting with an elasticsearch query and mapping for example?
1 comments

Right now it’s the typical inverted index

term -> list of doc ids

And the main purpose of the data structure is to recover term frequencies and document frequencies. We also store positional information to allow phrase matching.

BM25 of course is just one such way of using these stats. But you can also get raw termfreqs and docfreqs of matching terms and do whatever you want with them mathematically :).

The BM25 here tries to align to Lucenes internal BM25 calculation.