Hacker News new | ask | show | jobs
by mrighele 2559 days ago
It's a pity that robots.txt doesn't let you specify what the crawler can do with the resources it's allowed to fetch. I think that if we had such a feature (or something similar, like a "License" header) standardized early enough , a few issues regarding crawling and search engines would be moot, or at least easier to solve automatically.
1 comments

True but all the commercial websites would use it to ban scraping then.