|
|
|
|
|
by thrwyndx
1239 days ago
|
|
You throw them into a machine learning model together with a big dataset of queries/urls annotated by humans for relevancy. Catboost is yandex's choice of model here. > and doing it with an acceptable latency Lots of interesting optimizations possible here, but the big obvious one is multiple level models: score documents with a cheap model (FastRank in yandex lingvo) first using a subset of the fastest available features, then rescore top docs with your best slow expensive model. Perhaps rescore multiple times at different points in the stack with models of varying complexity, at each index shard and after aggregating the results from subset/all shards. Also sort documents in each index shard by some other ML model with query-independent features to push all the junk to the end of the index where you'd likely skip it when running out of time budget to process a query. > Also, what happened to Google page rank, is is still relevant today? Vanilla 1990s' pagerank obviously not, but the idea of such graph-based calculations is still very useful yes. |
|
what did we learn about the flaws?