Hacker News new | ask | show | jobs
by huhtenberg 2531 days ago
If you don't inspect and respect robots.txt, you shouldn't be surprised by sites actively blocking your crawlers. Ditto for when you try and work around crawling restrictions by hiding behind real browser UAs.