Hacker News new | ask | show | jobs
by malcolmhere 3580 days ago
I worked on a product like this for a long while, so I can appreciate how hard the problem is you are trying to solve. Some things I realized along the way:

* Price capture needs to happen in a headless browser (e.g. PhantomJS), rather than just capturing the HTML with a GET. Too many sites use JavaScript to make raw HTML analysis feasible.

* You can get > 50% of the pricing information with fairly simple matching on the class/id value in the HTML tag. But you need a headless browser to make sure the tag is visible. And since most product pages contain multiple prices, you need some heuristic to determine the relevant price. Oh, and watch of out for "reduced from" prices too (e.g. "Old Price: $50, New Price: $35".

* It doesn't hurt to be able to override the general heuristic on a domain-by-domain basis, saved me a lot of headaches.

* You need to be honest with yourself about how reliable the price capture algorithm is, and built up a regression database of known good pages, so when you change the algorithm, nothing else breaks. Also, you need to keep ahead of site redesigns!

* Product URLs tend to look messy, but tend not to change very often, if at all. I was worried about retailers e.g. changing product identifiers, but changing URLs hurts their SEO, so they don't do it. You will find "zombie" products, though - things which appear to be still on sale, but aren't linked anywhere on the site. Deciding when a product is sold out is tricky.

* The best user experience presents the items the user is watching as a "shopping basket". (I took design cues from Pinterest.) For a really slick experience, you should pick out the product name and image (Facebook meta-data helps here) and include them in you "pinned" products.

* Cutting-and-pasting URLs is a hassle. Consider writing a browser extension or a bookmarklet - users don't like to have the browsing flow interrupted by having to click across tabs. Having the price capture done inline on the page really impresses people.

Best of luck with this! I'm yet to see someone solve this problem well, and I eventually moved on other things after losing a lot of my hair. :-)

1 comments

Really interesting feedback. Thanks!