|
|
|
|
|
by simonw
973 days ago
|
|
My goal for related articles was to first filter to every document that shared at least one word with the target - which is probably EVERY document in the set - but then rank them based on which ones share the MOST words, scoring words that are rare in the corpus more highly. BM25 does that for free. Then I take the top ten by score and call those the "related articles". |
|