| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Arnavion 1261 days ago
	As annoying as it is, there is precedent for this opinion with RSS aggregator websites like Feedly. They discover new feed URLs when their users add them, and then keep auto-refreshing them without further explicit user interaction. They don't respect robots.txt either.

1 comments

kevincox 1261 days ago

I wouldn't expect or want an RSS aggregator to respect robots.txt for explicitly added feeds. That is effectively a human action asking for that feed to be monitored so robots.txt doesn't apply.

What would be good is respecting `Cache-Control`, which unfortunately many RSS clients don't, and just pick a schedule and poll on it.

link

Arnavion 1261 days ago

robots.txt was originally created to include such bots. That they think they don't need to respect it goes against the original intent.

Eg: https://www.robotstxt.org/faq/kinds.html >"What's New" monitoring

link

counttheforks 1261 days ago

I want my software to obey me, not someone else. If the software is discovering resources on its own, then obeying robots.txt is fair. But if the software is polling a resource I explicitly told it to, I would not expect it to make additional requests to fetch unrelated files such as a robots.txt

link

chillfox 1261 days ago

I can almost see both sides here... But ultimately when you are using someone else's resources, then not respecting their wishes (within reason) just makes you an asshole.

link