Hacker News new | ask | show | jobs
by renegat0x0 358 days ago
As a fun side note. This location is prohibited by robots.txt

I personally don't care, as big tech CEO already said in dawn of AI that they don't care about robots.txt

Additionally I have a project that is able to read RSS links and provides it in JSON response

https://github.com/rumca-js/crawler-buddy

2 comments

robots.txt does not "prohibit" anything. For some reason people have a misconception that robots.txt is used to block bots.

robots.txt is used to HELP bots. It tells bots what pages to visit and what pages are not intended for consumption. If a bot goes ahead and scraps everything anyway, that's entirely its own prerogative. Particularly for less sophisticated bots without a lot of storage, a good robots.txt can help it not get stuck on dynamically generated content or "useless for indexing" content.

Huh, you're right. What an odd choice to specifically ban automated tools from downloading RSS feeds in robots.txt