Hacker News new | ask | show | jobs
by jvz 2704 days ago
I was intrigued, but after visiting your website I'm ambivalent. The only way to truly combat shilling is to fully embrace a model where each person sees reviews only from their own personal network of sources— this drastically reduces the "eyeball multiplier" that makes mass media so attractive to marketers.

But seeing things like "Top 3 Recommended" on your website makes me believe you're falling into the same trap everyone else is, at least partially. I don't care how sophisticated your detection of paid content is; you simply can't win a direct war against these pathogens (marketers). The only way to win is not to play, i.e. don't provide a platform for mass reach in the first place. Mass media is really a type of monoculture, with all the same weaknesses.

Unrelated: how do you plan to solve the problem of identifying and classifying essentially every product in existence? Resolving duplicates, slight variants, etc. is a very hard problem, as is categorization.

1 comments

Our application does only show reviews from your personal network. Well, close. The app has a Q&A format (think Instagram meets Stack Overflow), and you see both questions and answers from your network. You can also see the user that asked a question that someone in your network has answered, and see users who answered the question someone in your network asked. In this way, you can be exposed to some new content. But reviews can't really spread virally as things stand now.

Re: product identification...actually people tend to shop from the same places (think of the product coverage of the top 5000 retailers). We don't have those top 5000 yet, but we're working on it. We get most of our product data from affiliate networks, who offer up that data to drive sales back to their site. For whatever we can't import and internalize, we have a "search anything" feature which allows the user to use a web view to navigate to a product page and import the product.

You're right, resolving duplicates and variants is a very tricky problem that can become incredibly complicated. Right now we use some very basic heuristics like normalized name and brand, ASIN, GTIN, EAN, UPC, etc. But actually, we don't need things to be perfect. So long as a user can get to a product that is more or less what they want to recommend, they're happy. Also, when you are searching for a product to recommend (this is not "browsing," it's when you are trying to answer a recommendation with a specific product) we boost products that have already been recommended. This way we can get users who are looking for a particular product to recommend the same instance of that product. We also focus our efforts on cleaning up the data for products that have already been recommended, which is a much smaller subset than our total catalog.

Our data is stored in Neo4j, we find its structure to be well suited to a product catalog and taxonomy, and it allows us to derive relationships between products. The process of improving our catalog is an ongoing task, and one that will likely never end.