|
|
|
|
|
by jackienotchan
261 days ago
|
|
AI crawlers have lead to a big surge in scraping activity, and most of these bots don't respect any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits, user agents, etc.). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported here on HN (and experienced myself). Does Webhound respect robots.txt directives and do you disclose the identity of your crawlers via user-agent header? |
|
This is definitely something we need to address on our end. Site owners should have clear ways to opt out, and crawlers should be identifiable. We're looking into either working with Firecrawl to improve this or potentially switching to a solution that gives us more control over respecting these standards.
Appreciate you bringing this up.