Hacker News new | ask | show | jobs
by mrskeltal 3410 days ago
Can you recommend some readings for developers who aren't familiar with search science?
1 comments

It's basically the union of Information Retreival, Big Data, and Machine Learning. There is a lot of good and bad info out there.

It's best to tailor your learning around a problem so you can get feedback on what works. Building bespoke search engines is expensive and often not worth it until the problem is big enough. Machine Learned optimization is even more expensive.

So in the likely instance that your problem is small; I'd stick to an off the shelf Lucene. If you need more specialization write plugins for it. If you need more speed then DIY OkapiBM25 in native (maybe Rust these days). If you need Big Data I'd use Spark. If you need ML then GDBT in R. If you need advanced NLP then Deep Learning. At each stage it's usually diminishing returns. So quit once you start losing money on the additional effort.

Edit: As a fun aside. Page Rank doesn't work very well and AFAIK Googles major advance was from creating 'meta documents' using anchor text and search queries. Google has a habit of sending out red herrings to guard their important ideas. So if some blogger is waxing lyrical about Page Rank you know they're full of it.

Page rank was very useful when they started, 19 years ago. It was killed by their own success as normal people stopped posting link sites, and bad actors added their own.
The people that I know at MS and Yahoo told me that Page Rank was never as good as meta-documents. And meta-documents was more than sufficient to explain the improvement in relevance.

Of course this is anecdotal and I would be happy to see evidence to the contrary.