Hacker News new | ask | show | jobs
by gav 2675 days ago
I call this the "massive heterogeneous catalog" problem.

Generally e-commerce retailers have grown from a fairly narrow set of product categories (e.g. books for Amazon) to adding more and more diverse categories. This has a dramatic impact on site search quality.

If you consider a simple example, shoes. You only need a couple of facets to filter products to a reasonable set to browse through: gender and size. Now start adding accessories, athletic clothing, and so on, and the results end up getting harder to navigate with generic search terms like "shoe" giving less relevant results (not having the context of the user's intent hurts here).

I tried "shoe" on Amazon, got over 400,000 results with the first item being a shoehorn. It takes a bunch of clicks to deal with that.

This search problem gets worse as catalog sizes grow even bigger. Personalized results help a lot and Amazon seem to fail me with this, they don't do a good job bubbling up the products I buy to the top.

It's a hard problem to solve but it's not going to kill Amazon.

1 comments

Absolutely - this challenge is especially acute for all ecommerce retailers with broad catalogs. Target, macy’s, walmart, jet, etc all face this challenge.

Systems like Solr, elastic search and endeca (out of the box) all assume relevance means keyword frequency in a product page, with some weighting depending of title, description, tag, etc. Delivering relevant results that users might want to purchase requires taking these systems, adding or customizing their NLP techniques, operationalizing historical user search & purchase data to determine intent, personalizing by shopper history, etc.

The challenges of massive heterogenous catalog affect other areas... Chief among them search result personalization… an individual’s gaming purchase history might cause ‘button down’ to return gaming keyboards, rather than oxford shirts, while a pet products purchase history could lead to a search for turkey returning turkey dog food.

The fact that Amazon fails to personalize search results is evidence of the difficulty & opportunity here. The sort of pervasive personalization found in AirBnb, facebook, google are simply out of reach of most ecommerce retailers…

> Target, macy’s, walmart, jet, etc all face this challenge.

I've been buying more household items from Jet recently because their smaller category is easier to navigate. Plus there's no third-party sellers and no pricing confusion like there is with Amazon vs. Amazon Fresh vs. Amazon Pantry.

> among them search result personalization

Agree 100%. Retailers need to consider the sometimes overlapping contexts of browsing history, purchases, and importantly the current browsing session (with weighting given to cart contents). Somebody currently browsing for food items should see food items when searching for "turkey", not dog food.

My favorite search example that fails without context is "dress". Does it mean "dress", "dress socks", or "dress shirts"? Even if it means "dress", are we talking about women's or girl's dresses?

I did an experiment a few years ago and found that it was possible to improve search relevancy dramatically by keeping track of items looked at and purchased, bucketing by category/sub-category, with an exponential decay and using this to boost popular categories in results. It's terribly low tech, but it gives a lot better results than no personalization.

There's a bunch of retailers that I visit frequently (and purchase from) that force me to search, the filter by men's, and do this for _every_ search. It would be great if they could just learn this coarse-grained level of personalization.

> The fact that Amazon fails to personalize search results is evidence of the difficulty & opportunity here

The opportunity for Amazon is massive. They don't seem to consider my purchase history at all when ranking products, for example if I search for "olive oil", the 31st item is the one that I've purchased three times in the last couple of years and the _only_ olive oil I've purchased.

I've spent a big chunk of the last decade trying to improve ecommerce search and it's a very neglected area across the board.