Hacker News new | ask | show | jobs
by BillFranklin 1910 days ago
Interesting article! Shopify's approach is cool, it's interesting they're using Kafka to generate datasets. I wonder if the explicit human rankings will get stale (and also be hugely outweighted by implicit judgements in the training data). The real-time feedback aspect sounds cool, I wonder if it's just for metrics or also for re-training in real-time.

I worked on a Learning To Rank implementation a year or so ago. What struck me then (and now reading about Shopify's implementation) is that the approach is often very similar across sites, but the implementation is usually rather tailored. You see the same patterns: online/offline metrics; nDCG; click models and implicit/explicit relevance judgements; re-ranking top-k of results, and so on.

Unfortunately there doesn't seem to be a technology tying all of the components of an LtR system together. A managed service like Algolia could be an answer. I wonder if industry will eventually converge on a framework, such as an extension to Open Source Connection's Elasticsearch Learning to Rank plugin (https://diff.wikimedia.org/2017/10/17/elasticsearch-learning...).

It's a really interesting area of theory and practice - I hope Shopify write more about their implementation!

I'd also recommend reading Airbnb's really excellent paper - https://arxiv.org/pdf/1810.09591.pdf.

1 comments

Appreciate the feedback and recommendation! You're right that explicit judgments can get stale - fortunately for our document collection the information architecture and article structures themselves are slow-changing (the answers themselves might change, but the document that answers the question probably won't for some time). We also primarily use explicit judgments to label head queries/common topics, and may augment our datasets with fresh data from time to time. The team is currently exploring augmenting these human-made datasets with automatic judgments using click models.

For realtime feedback, we've implemented (on another search product at Shopify, not the Help Center) a "near"-time feedback loop using implicit judgments to alter search results. Perhaps I'll write a post about that one soon :) . My colleague Doug talks a bit about the new systems we're building in this blog post - https://shopify.engineering/apache-beam-for-search-getting-s....

Great stuff, and it’s cool you are working with Doug, I enjoyed his book on search relevance :) I’ll look out for more posts from your team, good luck!