Hacker News new | ask | show | jobs
by NumberCruncher 1910 days ago
I should re-read the article because I can't see what kind of problem they try to solve with MAP, NDCG and "invented here" Pagerank what couldn't be solved with tf-idf and out-of-the box Elasticsearch functionality. It's a highly underrated peace of software.
3 comments

Hi, I worked on improving the search rankings for a popular package manager. Imagine your search algorithm is already excellent, you have countless of documents, and customers do countless of unique queries. Now say that you want to improve your search rankings further. How do you do that? What if your improvement helps some queries but hurts others? Things like tf-idf or Elasticsearch won't help here.

That's where NDCG comes in! Basically it gives a score for your search rankings that you can use to compare different search algorithms. The higher the score, the closer your algorithm was to producing the expected search results. This is super useful as you can try lots of experiments and get a good sense of whether the experiment is promising or not.

TFIDF (Spärck Jones (1972) Journal of Documentation) is a weighting scheme for word frequencies in the vector space model of information retrieval.

In constrast, MAP and NDCG (and others, like Precision, Recall, F1-score, MRR) are _evaluation_ metrics.

So the former are part of systems, the latter are part of measuring the quality of systems.

MAP, NDCG, etc are evaluation stats. Like unit tests for algorithm correctness. TF*IDF and Page rank are solutions that may or may not increase NDCG or MAP for your given problem/users