|
|
|
|
|
by renegat0x0
358 days ago
|
|
As a fun side note. This location is prohibited by robots.txt I personally don't care, as big tech CEO already said in dawn of AI that they don't care about robots.txt Additionally I have a project that is able to read RSS links and provides it in JSON response https://github.com/rumca-js/crawler-buddy |
|
robots.txt is used to HELP bots. It tells bots what pages to visit and what pages are not intended for consumption. If a bot goes ahead and scraps everything anyway, that's entirely its own prerogative. Particularly for less sophisticated bots without a lot of storage, a good robots.txt can help it not get stuck on dynamically generated content or "useless for indexing" content.