Hacker News new | ask | show | jobs
by simonw 973 days ago
My goal for related articles was to first filter to every document that shared at least one word with the target - which is probably EVERY document in the set - but then rank them based on which ones share the MOST words, scoring words that are rare in the corpus more highly. BM25 does that for free.

Then I take the top ten by score and call those the "related articles".