Hacker News new | ask | show | jobs
by kevincox 1263 days ago
I wouldn't expect or want an RSS aggregator to respect robots.txt for explicitly added feeds. That is effectively a human action asking for that feed to be monitored so robots.txt doesn't apply.

What would be good is respecting `Cache-Control`, which unfortunately many RSS clients don't, and just pick a schedule and poll on it.

1 comments

robots.txt was originally created to include such bots. That they think they don't need to respect it goes against the original intent.

Eg: https://www.robotstxt.org/faq/kinds.html >"What's New" monitoring

I want my software to obey me, not someone else. If the software is discovering resources on its own, then obeying robots.txt is fair. But if the software is polling a resource I explicitly told it to, I would not expect it to make additional requests to fetch unrelated files such as a robots.txt
I can almost see both sides here... But ultimately when you are using someone else's resources, then not respecting their wishes (within reason) just makes you an asshole.