|
|
|
|
|
by researcher11
3410 days ago
|
|
It's basically the union of Information Retreival, Big Data, and Machine Learning. There is a lot of good and bad info out there. It's best to tailor your learning around a problem so you can get feedback on what works. Building bespoke search engines is expensive and often not worth it until the problem is big enough. Machine Learned optimization is even more expensive. So in the likely instance that your problem is small; I'd stick to an off the shelf Lucene. If you need more specialization write plugins for it. If you need more speed then DIY OkapiBM25 in native (maybe Rust these days). If you need Big Data I'd use Spark. If you need ML then GDBT in R. If you need advanced NLP then Deep Learning. At each stage it's usually diminishing returns. So quit once you start losing money on the additional effort. Edit: As a fun aside. Page Rank doesn't work very well and AFAIK Googles major advance was from creating 'meta documents' using anchor text and search queries. Google has a habit of sending out red herrings to guard their important ideas. So if some blogger is waxing lyrical about Page Rank you know they're full of it. |
|