Hacker News new | ask | show | jobs
by saraid216 4572 days ago
This isn't information retrieval. This is data processing. Information retrieval is a subset of data processing.

Retrieval specifically needs an algorithm to determine document relevance. Everything you're learning is to understand how different parts of that algorithm affect the results. It's a very difficult problem, even if you assume that the corpus isn't sapient.

Stuff like n-grams are more about reshuffling in order to expose patterns. It's a little bit like regressing some noisy data to see the trend of correlation.