Hacker News new | ask | show | jobs
by gardaani 460 days ago
Does your crawler respect robots.txt? Does it request pages with nice delays so that it doesn't bring down servers? Related: https://news.ycombinator.com/item?id=43422413
1 comments

We don’t respect robots.txt because estate agents often block listing pages - not necessarily to prevent indexing, but for SEO reasons (Google penalises large sites with transient pages).

That said, we do crawl responsibly i.e. we use reasonable request delays, respect rate limits etc. We want agents to like us, ultimately, and blowing up their servers doesn't help with that. If an agent prefers opt-out, we always honour it.