|
|
|
|
|
by softwaredoug
939 days ago
|
|
Right now it’s the typical inverted index term -> list of doc ids And the main purpose of the data structure is to recover term frequencies and document frequencies. We also store positional information to allow phrase matching. BM25 of course is just one such way of using these stats. But you can also get raw termfreqs and docfreqs of matching terms and do whatever you want with them mathematically :). The BM25 here tries to align to Lucenes internal BM25 calculation. |
|