| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by netvarun 4864 days ago
	We currently get the pricing data via rss feeds, crawling, data dumps and for some cases also crowdsourcing. In the long run, we also hope to establish merchant relationships and get the data directly. To the original question on crawling - (I had replied to a similar question previously on HN): "Some great advice here on crawling at scale, which has inspired our crawlers a lot : http://news.ycombinator.com/item?id=4367933 Basically it boils down to three things: 1. If the site is slow,crawl slooowly. 2. If you see non-200 http error codes, stop! 3. Obey robots.txt and speed restrictions."