Hacker News new | ask | show | jobs
by marginalia_nu 38 days ago
Robots.txt is great if you're trying to run an above board operation. Much easier than trying to guess how a webmaster wishes the crawler to behave, and then getting angry emails when you guess wrong.
1 comments

It's not great. It used to be very common that robots.txt would Disallow *, Allow GoogleBot which just entrenches the search engine monopoly. In response to this other search engines just used the rules for GoogleBot instead of the rules for their own crawlers.
Eh, not really my experience running an internet search engine and a crawler. It happens occasionally, but mostly people seem to focus on what they perceive as nuisance crawlers if they do disallow any specific UAs.